Hacker News new | ask | show | jobs
by Freebytes 6038 days ago
Perl would certainly be my language of choice for screen scraping. And, some people see it almost like stealing. I know of people that look at Google negatively for its news indexing method. I have skimmed through a book (though I do not remember the name) that seemed to claim that Google profits only from the work of others. (I think they added value in their collaboration of information, though.) Nonetheless, you must be careful not to create a backlash (or legal issue) with screen scraping. The irony is that one of the best targets for screen scraping content for your own benefit may be Google itself... however, it almost seems like they encourage it. (They want you to use their APIs instead, though.)

Spidering has been around for a long time, and people act like screen scraping is new. It is really the same that has existed for years. If you are going to do it, though, Perl is certainly the way to go. It is fast, efficient, and robust.

1 comments

Theft requires that the taking denies the current owner access/use to whatever was taken. Copyright infringement appears to be what you're referring too.

Google IMO is more of a symbiont than a parasite.

You are correct, and I agree. It is not theft, and Google is really a huge collection of mitochondria... helping the fledgling Internet become something more by combining it with an intellectual powerhouse. I could not have said it any better myself.