However, it's not able to execute JavaScript. The only lib I found which does so with a reasonable subset of JS is HttpUnit, in Java. Though it has kind of ugly interface IMO, I use it with a success. Doing it from a Clojure REPL makes it quite handy tool for web scripting.
Perl would certainly be my language of choice for screen scraping. And, some people see it almost like stealing. I know of people that look at Google negatively for its news indexing method. I have skimmed through a book (though I do not remember the name) that seemed to claim that Google profits only from the work of others. (I think they added value in their collaboration of information, though.) Nonetheless, you must be careful not to create a backlash (or legal issue) with screen scraping. The irony is that one of the best targets for screen scraping content for your own benefit may be Google itself... however, it almost seems like they encourage it. (They want you to use their APIs instead, though.)
Spidering has been around for a long time, and people act like screen scraping is new. It is really the same that has existed for years. If you are going to do it, though, Perl is certainly the way to go. It is fast, efficient, and robust.
Theft requires that the taking denies the current owner access/use to whatever was taken. Copyright infringement appears to be what you're referring too.
You are correct, and I agree. It is not theft, and Google is really a huge collection of mitochondria... helping the fledgling Internet become something more by combining it with an intellectual powerhouse. I could not have said it any better myself.
Perl has been the language of choice for web spidering since 1999, when it replaced REBOL for that purpose. Hint: LWP module made such a dent on the industry, nobody was able to replace it until the last year or two when other things started popping up.
However, it's not able to execute JavaScript. The only lib I found which does so with a reasonable subset of JS is HttpUnit, in Java. Though it has kind of ugly interface IMO, I use it with a success. Doing it from a Clojure REPL makes it quite handy tool for web scripting.