Hacker News new | ask | show | jobs
by tyingq 3246 days ago
Mentioned this in another comment, but for some websites, the scraping problem has real costs. Airline, hotel, stock prices,etc. For some spaces, scaling and paying bandwidth for unconstrained scraping is costly. And not restricting it hurts the legitimate users because the performance sucks.

There are also the scrapers blindly looking for vulnerabilities or other unsavory tactics.

2 comments

While I agree with you to a degree most airlines and hotels have APIs that can be consumed to get pricing information there are just restrictions in what you can do with that information.

Not sure about stock prices (I think it's pretty common to pay for real time data there?).

But I can certainly see sites that have a lot of data for their users facing major bandwidth costs if a lot of people were scraping their data. This type of detection isn't really an answer for that, though, as it's easy to mitigate for a scrapper.

Also, what I've learned is how little regard for your site your scrapers often have, scraping as aggressively as possible.

You're just not always in a place to scale to the abuse or build something more complex than some simple heuristic filters.

> what I've learned is how little regard for your site your scrapers often have, scraping as aggressively as possible.

Often? Based on what data?

I find it much more likely you only often notice aggressive scrapers. That however tells you nothing about the behavior of the average web scraper or web scrapers in general.

The system encourages it. Ingress data is cheap, and so many scrapers just default to high frequency.