WebCopy - copy a remote web subtree to the local disk

Fetch the software.

Given one or more URLs as arguments, enumerates the files reachable at or below those URLs and copies them to the local disk, creating subdirectories as necessary.

Options:

-v
Verbose. Shows names of files being copied.
-f
Force overwriting of existing files. Otherwise they are left alone.
-d
Maximum depth to copy. Depth refers to how many links to follow. A depth of 0 means just copy the file given on the command line, don't follow any links at all. Without this flag there is no limit on the depth, the entire subtree is copied.
-e
Edit local URLs. If an HTML file contains a URL that is *unnecessarily* absolute - i.e. it's absolute but it refers to a location within the tree being copied - then convert it to a relative URL. Without this flag, all files are copied verbatim. With it, the copied tree is a self-contained functional snapshot of the remote.

Sample run:


% mkdir flow
% cd flow
% WebCopy -v http://www.acme.com/jef/flow/
Copying http://www.acme.com/jef/flow/ to index.html
Copying http://www.acme.com/jef/flow/troublemaker_small.jpg to troublemaker_small.jpg
Copying http://www.acme.com/jef/flow/cdec.html to cdec.html
Copying http://www.acme.com/jef/flow/snapshots/ to snapshots/index.html
Copying http://www.acme.com/jef/flow/snapshots/16may96.html to snapshots/16may96.html
Copying http://www.acme.com/jef/flow/snapshots/16may96_namerican.gif to snapshots/16may96_namerican.gif
% ls -l
-rw-r--r--   1 jef         39759 Jul  5 14:40 cdec.html
-rw-r--r--   1 jef           993 Jul  5 14:40 index.html
drwxr-x--x   2 jef           512 Jul  5 14:40 snapshots
-rw-r--r--   1 jef          3107 Jul  5 14:40 troublemaker_small.jpg

See also: JavaWrapper, Acme.Spider.


Back to software.
Back to ACME Java.
Back to ACME Labs.