| I personally don't care the intended use of the crawling -- and also don't know that the bots we are seeing now are "AI bots", I would not have used that phrase. What many of us have seen is a huge increase in bot crawling traffic, from highly distributed IPs, and often requesting insane combinations of query params that don't actually get them useful content -- that bring down our sites. (And that increase their volume if you scale up your resources!) They seem to have very deep pockets, in that they don't mind that they are scraping terrabytes of useless/duplicate content from me (they can get all the actual useful open content from my SiteMap and I wouldn't mind!) That's what bothers me. I don't care if they scrape my site for AI purposes in polite robots.txt-respecting honest-user-agent low-volume ways. And if they are doing it the way they are doing it for something other than AI, it's just as much of a problem. (The best guess is just that it's for AI). So I agree with you that I wouldn't have spoken of this in terms of "AI". But it has become a huge problem. "Fighting the AI scraperbot scourge"
https://lwn.net/Articles/1008897/ "LLM crawlers continue to DDoS SourceHut"
https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/ "Open Source devs say AI crawlers dominate traffic, forcing blocks on entire countries"
https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-domi... Some of us -- we think jokingly -- wonder if Cloudflare or other WAF purveyors are behind it. It is leaving most of us no choice but some kind of WAF or bot detection. |