|
|
|
|
|
by danielheath
17 days ago
|
|
It's circumstantial evidence, but Occam's Razor also applies. It's not a hostile DOS in the traditional sense (I've mitigated a few of those) - no "pay us to make it stop", no pattern to the requests other than "fetch every unique URL a few times". It wasn't happening until financial incentives to gather large datasets for AI training appeared. Bad actors (using residential proxies & claiming to be a real browser) mostly showed up after folk started blocking ones that identified themselves as AI scrapers. It's obvious to blame AI training because there's a shortage of better explanations. Who else would be paying for these (expensive) residential botnets, only to use them to (eg) web-scrape wikipedia (which offers free downloads of its content in a structured format)? The simplest explanation of the technical behavior is "a bot coded to follow every link it sees & save the results", and the simplest explanation of the motive to run such a bot is "to train a large language model". |
|
"use Cloudflare to make it stop"