Hacker News new | ask | show | jobs
by dpryden 1693 days ago
It always amazes me how people believe they have a right to retrive data from a website. The HTTP protocol calls it a request for a reason: you are asking for data. The server is allowed to say no, for any reason it likes, even a reason you don't agree with.

This whole field of scraping and anti-bot technology is an arms race: one side gets better at something, the other side gets better at countering it. An arms race benefits no one but the arms dealers.

If we translate this behavior into the real world, it ends up looking like https://xkcd.com/1499

1 comments

Because often that data is only available through scraping.

Nobody wants to scrape, it's messy and fickle and a general pain in the backside. But sometimes the data you need exists only in that form.

If you run a website and you have a problem with scrapers, then make all that data available through an API and say what acceptable rate limits are. If cost is an issue, then charge a proportionate fee, my time writing a scraper is worth much more than paying a few dollars for an API.

If you just say "No" to everything then you lose all control over the process and the only outcome will be such an arm race.

God. This. The number of times I've spent 2 days of my very expensive time coding a scraper to get data I'll use once, when I would have paid a few dollars just to download it in a text file.