All Packages Class Hierarchy This Package Previous Next Index
Class Webcrawler.Crawler.URLTree
java.lang.Object
|
+----Webcrawler.Crawler.URLTree
- public class URLTree
- extends Object
Organizes a tree of URLNodes.
When a new node is being loaded from the network, the loadedNode(n) method
must be called. The checkLoaded(n) method can then check if a specified
URL was already loaded. This is useful for not loading the same thing
over and over again. (Implementation uses a Hashtable)
Since the Reference-part (#) of a http address is only a reference within
a HTML file, it is unnecessary to load the file with the reference if the
file without the ref has already been loaded. As a result of this the
methods loadedNode and checkLoaded only use the 1st part of the URL without
the reference.
-
loaded
-
-
rootNode
-
-
URLTree(String)
- Creates a new HTMLNode as the root of this tree.
-
checkLoaded(LoadableNode)
- Says if the spec.
-
getRootNode()
- Returns the reference to the root of the tree.
-
loadedNode(LoadableNode)
- Registers the URL of the node n to be already loaded from the network.
rootNode
protected HTMLNode rootNode
loaded
protected Hashtable loaded
URLTree
public URLTree(String url) throws MalformedURLException
- Creates a new HTMLNode as the root of this tree.
loadedNode
protected void loadedNode(LoadableNode n)
- Registers the URL of the node n to be already loaded from the network.
(URL without reference-part)
- Parameters:
- n - the node to be registered as loaded
checkLoaded
protected boolean checkLoaded(LoadableNode checkme)
- Says if the spec. URL (checkme) was already loaded from the network.
Before loading an URL always use this method to check that it hasn't
been downloaded before. This prevents unnecessary downloads.
If the URL of checkme has been loaded before, this method calls
checkme.copy(theloadednode) and sets checkme.URLType to recursive.
(URLs without reference-part)
- Parameters:
- checkme - the node to be checked if it was already loaded
- Returns:
- has been loaded before true/false
getRootNode
public HTMLNode getRootNode()
- Returns the reference to the root of the tree.
All Packages Class Hierarchy This Package Previous Next Index