The General tab:


A Website has a structure like a tree. The root-file (Start URL) contains links to other pages, which contain links to other pages, a.s.o. Since in commercial pages this can be never ending, the Webcrawler has to stop at some point (Max. Depth). But even if you specify a Max. Depth of only 1 or 2, one HTML-file can contain so many links, that you still end up downloading too many files. To prevent this you can specify a Max Node# which tells the Crawler the maximum number of files to download.

Start URL ..... Specify the URL where the Webcrawler should start at. This can either be a http or a file URL.
e.g: http://www.sun.com
e.g: file:///c:/html/index.html

Max. Depth ..... With the Max. Depth you can determine how "deep" into the tree the Webcrawler should dive. Of course, the higher you set this value, the longer you have to wait.

Max. Node# ..... Determines the maximum number of nodes you wish to download from the net.

To start the Crawler with the specified settings click on the Start Button.

To stop it press the Stop Button.

In the information-field under the buttons you can always see what the Crawler is doing at the moment.

Underneath that there's a progress bar that indicates wheter the Crawler is working or not.