Hacker News new | ask | show | jobs
by rhett 6307 days ago
Here is a web spider in 1 line: perl -MLWP::UserAgent -MHTML::LinkExtor -MURI::URL -lwe '$ua = LWP::UserAgent->new; while (my $link = shift @ARGV) { print STDERR "working on $link";HTML::LinkExtor->new( sub { my ($t, %a) = @_; my @links = map { url($_, $link)->abs() } grep { defined } @a{qw/href img/}; print STDERR "+ $_" foreach @links; push @ARGV, @links} )->parse(do { my $r = $ua->simple_request (HTTP::Request->new("GET", $link)); $r->content_type eq "text/html" ? $r-> content : ""; } ) }' http://www.google.com
2 comments

I tested the above script - It works. You are good.
thanks, i didn't write it. I remembered that in some magazine from 1999. The link to the author is in the comment below