| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stevefeinstein 3246 days ago
	So again someone wants to punish all the legitimate people using a web site to get some marginal benefit from detecting the remaining <1%. The inevitable false positives don't affect the "malicious" users. Only the legitimate ones. And how much will this bloat the page load by? Adding more code to an already overly large page isn't helping anyone. Just let the web be the web, and stop trying to control it.

2 comments

tyingq 3245 days ago

Mentioned this in another comment, but for some websites, the scraping problem has real costs. Airline, hotel, stock prices,etc. For some spaces, scaling and paying bandwidth for unconstrained scraping is costly. And not restricting it hurts the legitimate users because the performance sucks.

There are also the scrapers blindly looking for vulnerabilities or other unsavory tactics.

link

BinaryIdiot 3245 days ago

While I agree with you to a degree most airlines and hotels have APIs that can be consumed to get pricing information there are just restrictions in what you can do with that information.

Not sure about stock prices (I think it's pretty common to pay for real time data there?).

But I can certainly see sites that have a lot of data for their users facing major bandwidth costs if a lot of people were scraping their data. This type of detection isn't really an answer for that, though, as it's easy to mitigate for a scrapper.

link

always_good 3245 days ago

Also, what I've learned is how little regard for your site your scrapers often have, scraping as aggressively as possible.

You're just not always in a place to scale to the abuse or build something more complex than some simple heuristic filters.

link

josteink 3245 days ago

> what I've learned is how little regard for your site your scrapers often have, scraping as aggressively as possible.

Often? Based on what data?

I find it much more likely you only often notice aggressive scrapers. That however tells you nothing about the behavior of the average web scraper or web scrapers in general.

link

tyingq 3244 days ago

The system encourages it. Ingress data is cheap, and so many scrapers just default to high frequency.

link

mdominguez 3245 days ago

You're not taking into consideration the economical consequences some bots have on companies. Some bots are designed to make payments with fake or stolen credit cards. Some bots impact on people that need to manually check for submissions or takedown notices. Obviously I agree that there are legitimate use for bots and scrapers, but that admittedly low percentage of fraudulent use cases do cause a lot of harm.

link

walterstucco 3245 days ago

Some bot can and will use a real browser, in a real window, opening many sessions, randomising usage patterns to look more human, and continue doing what they already do.

link

chii 3245 days ago

if a human is allowed to do something on a site, it goes to reason that a bot should be allowed too (granted using the same access frequency as a human).

Blocking scraping is like DRM. Don't do it. Use a legal mechanism to deal with copyright infringement, and use acceptable usage policy to deal with heavy users that are using more than their "fair" share of bandwidth.

link

kuschku 3245 days ago

Some people just hire actual humans to do this, for cheap.

link