Hacker News new | ask | show | jobs
by matthewmacleod 2944 days ago
On one hand, it does make a lot of sense that many web publishers want to keep people from scraping content, given the way that it's often used nefariously, to violate copyright, or for spam purposes.

But there are totally legitimate reasons to scrape as well. Altmetric (https://www.altmetric.com), which is the company I work for, tracks links to scientific research. So when someone on e.g. Twitter links to a page on nature.com, we want to scrap the page they linked to and figure out which paper they are talking about (if any). Academic publishers can be particularly sensitive to scraping, making the endeavour much more work than it needs to be.

It's a real shame that the web has moved to be so closed off in many ways.

1 comments

The web is not becoming closed off from users. It's becoming hostile to bots. Not the same.