| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danielheath 17 days ago

It's circumstantial evidence, but Occam's Razor also applies.

It's not a hostile DOS in the traditional sense (I've mitigated a few of those) - no "pay us to make it stop", no pattern to the requests other than "fetch every unique URL a few times".

It wasn't happening until financial incentives to gather large datasets for AI training appeared.

Bad actors (using residential proxies & claiming to be a real browser) mostly showed up after folk started blocking ones that identified themselves as AI scrapers.

It's obvious to blame AI training because there's a shortage of better explanations. Who else would be paying for these (expensive) residential botnets, only to use them to (eg) web-scrape wikipedia (which offers free downloads of its content in a structured format)?

The simplest explanation of the technical behavior is "a bot coded to follow every link it sees & save the results", and the simplest explanation of the motive to run such a bot is "to train a large language model".

1 comments

userbinator 16 days ago

no "pay us to make it stop"

"use Cloudflare to make it stop"

link

danielheath 16 days ago

Or fastly, or akamai, or bunny, or any number of other providers.

Cloudflare are merely the cheapest of the bunch.

link

userbinator 16 days ago

Exactly. They (and most of all, Big G) stand to profit greatly from this browser discrimination. What better than to make more sites use them by launching DDoS attacks in the name of "AI scraping".

link