Notes on spellcheck
This is a program that will basically spellcheck a single webpage. This is a research project that will help me learn the Java Application Framework, HTTPClient, downloading, parsing, and spellchecking. The things that I learn here will help me in another project I am working on.
Goal 1: Build an Object that can download anything from the internet using HTTP and eventually FTP and HTTPS. It should be seamless between the different protocols. There should be a configuration, a way to construct it with specific settings. There should also be an interface to view the progress and handle the information being downloaded. It should be able to be run in a thread or Task. There should also be a way to stop the download and either restart it from where it stopped or start the download over again.
Here is an example of how the code should work.
public class Task implements DownloadInterface {
public void doChunk(byte array) from DownloadInterface {
store to file or inspect information (parse)
}
public void doStatusChange(status) from DownloadInterface {
Inform view of status change... connecting, downloading, complete, error
error checking will take place here.
--range is not valid or not supported
--standard connection errors
--login errors
}
public void doProgress(percentage of download complete) from DownloadInterface {
update view
}
public void doFileType(String type) from DownloadInterface {
change do chunck based on file type
}
public void doFinished() from DownloadInterface {
close file
close parsing
}
public void doInBackground() {
Download d = new Download();
d.setURL(url);
d.setUserInfo(username, password);
d.set...
if restarting download
d.setStartByte(where download left off);
d.execute();
...check for errors, restart or allow user to fix
}
}
Advantages of this design.
--Can download to any outputstream.
--Can also just look at the content of a file without saving it anywhere.
--Developer is in charge of saving to a file. restarting a download from a certain range.
--Is not protocol dependent. Can use HTTP, FTP, HTTPS, SMB or any other way of moving files.
Disadvantages
--Doing things that are protocol specific will break this abstraction (cookies, https certificates)
Protocol specific issues may be fixed with the configuration object but there are no issues with this design in spellcheck.
Goal 2: DownloadConfiguration
Used to configure the download so that it can connect to a resource. The programmer shouldn't have to configure every download and this will allow for things to be set in one object and be passed to each new download. The only thing this object needs to do is supply the info required by HTTPClient.
Goal 3: Logging
I can only see the need for a simple logger, for testing only. There will eventually be multiple tasks and logging would not be useful in that type of application. The logging should be done in the Task only and specific for the task. It can be used in the spellcheck app because only one task is run at a time. It can have different levels but I don't want to spend to much time on it. I think verbose and normal logging should be fine for this project.
So lets review that basics of this application.
1. The user enters a URL
2. The task will start and block the user until finished (Java App Framework)
3. The task will download the url and perform the following while downloading
a. first check the content type and make sure it contains text
a. log what is going on (verbose is optional)
b. display the content as it is downloaded
c. parse and display the words found
d. check the spelling of the words and display the misspelled words with suggestions
4. When finished the user should be able to start over again
There are certain features needed for this app and other features that will be used in the future. The following are the features needed for spellcheck.
1. Download Object that can download HTTP content
2. Stopping the download after it starts and before it ends
3. Displaying the content while it is being downloaded
4. Displaying the time it takes to download
5. Displaying the content length, bytes downloaded, and time for download
6. parse the words and display them
7. spellcheck words and display misspelled words
8. If errors occur display message box to user
These are general features that any application may want as far as downloading is concern.
1. Retry a download a set number of times before giving up and waiting between retries
2. Stopping a download and restarting where it left off... even if the program shuts down
3. Setting a max daily bandwidth for all downloads
4. Proxy server settings for each protocol
5. Setting a max number of hops or redirects a download will follow before it starts downloading
6. Setting a connection timeout or read timeout and the ability to display the times when this error occurs.