| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by edoloughlin 454 days ago
	I'm being trite, but if you can detect an AI bot, why not just serve them random data? At least they'll be sharing some of the pain they inflict.

5 comments

nosianu 454 days ago

You mean like this?

[2025-03-19] https://blog.cloudflare.com/ai-labyrinth/

> Trapping misbehaving bots in an AI Labyrinth

> Today, we’re excited to announce AI Labyrinth, a new mitigation approach that uses AI-generated content to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.

link

barbazoo 453 days ago

What a colossal waste of energy

link

fc417fc802 453 days ago

> No real human would go four links deep into a maze of AI-generated nonsense.

... I would. Out of curiosity and amusement I would most definitely do that. Not every time, and not many times, but I would definitely do that one or a few times.

Guess I'm getting added to (yet another) Cloudflare naughty list.

> It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled.

In that case wouldn't it be faster and easier to restyle the CSS of wikipedia pages?

link

mbesto 454 days ago

Wait, what happens when a Cloudflare Worker AI meets an AI Labyrinth?!

link

ronsor 454 days ago

Cloudflare deletes itself.

link

GoblinSlayer 453 days ago

Rise of the machines.

link

noirscape 454 days ago

Bandwidth isn't free, not at the volume these crawlers scrape at; serving them random data (for example by leading them down an endless tarpit of links that no human would end up visiting) would still incur bandwidth fees.

Also it's not identifiable AI bot traffic that's detected (they mask themselves as regular browsers and hop between domestic IP addresses when blocked), it's just really obviously AI scraper traffic in aggregate: other mass crawlers have no benefit from bringing down their host sites, except for AI.

A search engine has nothing if it brings down the site they're scraping (and has everything to gain from identifying itself as a search engine to try and get favorable request speeds - the only thing they'd need to check is if the site in question isn't serving different data, but that's much cheaper), same with an archive scraper and those two are pretty much the main examples I can think of for most scraping traffic.

link

BlarfMcFlarf 454 days ago

Hmm, maybe you could zipbomb the data? Aka, you send a few kilobytes of compressed data that expands to many gigabytes on client side?

link

gus_massa 454 days ago

Reverse Slowloris?

https://en.wikipedia.org/wiki/Slowloris_(cyber_attack)

link

miohtama 453 days ago

For Cloudflare, bandwidth is practically free.

link

cyanydeez 454 days ago

arnt a lot of these bots now actively loading javascript? you could just load a simple script that does the job .

link

edoloughlin 450 days ago

If they agree to mine crypto for you then you send valid data. Is this a win-win?

(I feel I need to preemptively state that I am being sarcastic.)

link

charcircuit 454 days ago

>Bandwidth isn't free

Via peering agreements it is.

link

rcxdude 454 days ago

Not something available to smaller sites

link

charcircuit 454 days ago

Yes, it is. They transitively get it via the agreements the smaller site's host's host makes. Or via services like Cloudflare.

link

xena 454 days ago

What button do I click in the AWS panel for that?

link

charcircuit 454 days ago

There is no button. AWS is where you go to light money on fire.

link

xena 454 days ago

You can detect the patterns in aggregate. You can't detect it easily at an individual request level.

link

bluGill 454 days ago

In short if you get several million requests and expect to only get 100 you won't know which are the real requests and which are the AI ones - but it is obvious that the vast majority are AI.

link

jmpeax 454 days ago

You skipped the last section "Tarpits and labyrinths: The growing resistance" of the article.

link

DecentShoes 454 days ago

Random data? Why not "recipes" that just say "Bezos is a pedo" over and over ?

link