Webcrawler

Java application written by Christof Prenninger, 1997-1998
adviser: DI Markus A. Hof

The Webcrawler can be used for crawling through a whole site on the Inter-/Intranet. You specify a start-URL and the Crawler follows all links found in that HTML page. This usually leads to more links which will be followed again, and so on. A site can be seen as a tree-structure, the root is the start-URL; all links in that root-HTML-page are direct sons of the root. Subsequent links are then sons of the previous sons.


This program is a Java application (cannot be run as an applet), that implements the Model-View-Controller (MVC) pattern. The Crawler represents the model; this is the program that does all the work. So far I have implemented 2 different controllers and one view. One of the controllers is a simple StringFinder, the other a Grabber that downloads a whole site onto the local harddisk. The view shows the tree-structure of the specified site, plus an optional Tracer-Window that displays the internal work of the Crawler.


If you want to test this program, please download the following packages and unzip them to a new directory (e.g: C:/Program Files/Java/Webcrawler). Then view the readfirst.html file in that directory.
crawl.zipThe main program and help-files.
classes.zipThe classes needed to run the program.
doc.zipThe documentation (HTML files) - optional.
You will also need the Java JDK 1.1.6 and the Swing 1.0.2, both from Sun.

Please mail comments to Christof Prenninger and check out my homepage. Thanks!