Webcrawler
(C) 1997-98, Christof Prenninger
10 credit project (Java)
What is it for?
How is it implemented?
The surrounding application creates one Controller (which creates one Crawler) and 0, one or more Visualizers. The same application also connects those objects togehter (All created Visualizers need to be connected to the Crawler as Observers; the Crawler is connected to the Controller automatically).
Crawler:
The Crawler holds the tree-structure and manipulates it while crawling through the site. Whenever the Crawler makes changes to the tree-structure, it sends out VisualizerMessages to all attached Visualizers. The Crawler is controlled by a Controller.
Controller:
The Controller controls the Crawler ("start!"), the Crawler "asks" the Controller specific questions, like "should I download this link?" or "where should I store the downloaded data?". Every Controller needs to implement the ControllerInterface in order to be accepted by the Crawler (More info in the ControllerInterface documentation). I implemented an abstract class, w
hich - although it is abstract - handles most of the basic methods described in the ControllerInterface: Controller. All new Controllers should be derived from this class, and the constructors/methods of the super-class should always be called in derived constructors/methods.
So far I implemented 2 Controllers which are derived from Controller and can be used as examples for creating new Controllers:
When the Controller is created, it creates a Crawler object.
Visualizer:
I implemented one Visualizer which displays the tree-structure created by the Crawler in a Win95-style tree. Any other Observer that handles VisualizerMessages can be used to visually show the Crawler's work and the structure of the site. My Visualizer uses colors to show what the Crawler is currently doing at/with a link, icons to show the type of link and a JTree (Swing JFC) to show the structure of the site.
The surrounding application I wrote, simply lets the user choose one of the 2 Controllers, creates the selected Controller (which creates a Crawler) and a Visualizer and starts the program. Just try and see for yourself. Check out Crawl.java to see how the objects are connected together.
For more details on how the Crawler works, click here.