Web-Spider

This is a web spider (robot) I have written some years ago. It recursively visits the pages on a site and downloads its contents according to the specified filter parameters.
For example you can download all html pages or all images from a web-site with one run.
The program stores a list of all downloaded content, so it can continue a partially finished download at a later time.
Note that the spider remains always on the same domain, to avoid downloading of the whole internet.
There are both a Java and a C version. The C version runs quicker than the Java one, but it needs some DLLs (MFC42.DLL, WinInet.DLL) in order to run. These are normally stored in the System32 subdirectory of Windows.

How to start the C version:

lopoc URL minimum_size maximum_size [extension1 extension2 ...]

How to start the Java version:

java Lopo URL minimum_size maximum_size extension1 [extension2 ...]

where the parameters extension1, extension2, … serve as file filtering after file-extension, minimum_size and maximum_size serve as filtering after filesize. Note that due to unfinished implementation, filtering after size does no effect in the C version.
After starting, you must provide some other parameters to the program, such as authentication information (login, password) for the local proxy and/or for the remote server.

Download:

Leave a Reply Cancel reply