|
|
|
|
|
by rhett
6307 days ago
|
|
Here is a web spider in 1 line:
perl -MLWP::UserAgent -MHTML::LinkExtor -MURI::URL -lwe '$ua = LWP::UserAgent->new; while (my $link = shift @ARGV) { print STDERR "working on $link";HTML::LinkExtor->new( sub { my ($t, %a) = @_; my @links = map { url($_, $link)->abs() } grep { defined } @a{qw/href img/}; print STDERR "+ $_" foreach @links; push @ARGV, @links} )->parse(do { my $r = $ua->simple_request (HTTP::Request->new("GET", $link)); $r->content_type eq "text/html" ? $r-> content : ""; } ) }' http://www.google.com |
|
http://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0013.html