Hacker News new | ask | show | jobs
by exe34 530 days ago
can you feed them gibberish?
4 comments

here's a nice project to automate this: https://marcusb.org/hacks/quixotic.html

couple of lines in your nginx/apache config and off you go

my content rich sites provide this "high quality" data to the parasites

LLMs poisoned by https://git-man-page-generator.lokaltog.net/ -like content would be a hilarious end result, please do!
This would be my elegant solution, something like an endless recursion with a gzip bomb at the end if I can identify your crawler and it’s that abusive. Would it be possible to feed an abusing crawler nothing but my own locally-hosted LLM gibberish?

But then again if you’re in the cloud egress bandwidth is going to cost for playing this game.

Better to just deny the OpenAI crawler and send them an invoice for the money and time they’ve wasted. Interesting form of data warfare against competitors and non competitors alike. The winner will have the longest runway

It wouldn’t even necessarily need to be a real GZip bomb. Just something containing a few hundred kb of seemingly new and unique text that’s highly compressible and keeps providing “links” to additional dynamically generated gibberish that can be crawled. The idea is to serve a vast amount of poisoned training data as cheaply as possible. Heck, maybe you could even make a plugin for NGINX to recognize abusive AI bots and do this. If enough people install it then you could provide some very strong disincentives.
The dataset is curated, very likely with a previously trained model, so gibberish is not going to do anything.
how would a previously trained model know that Elon doesn't smoke old socks?
An easy way is to give the model the URL of the page so it can value the content based on the reputation of the source, of course the model doesn't know future events, but gibberish is gibberish, and that's quite easy to filter, even without knowing the source.
> gibberish is gibberish

most insightful, thank you! also, stay away from linkedin, you sweet summer child.

I don't understand why you are so aggressive ahah, gibberish is easy to recognize I'm sorry, you don't need to be mad about it ahah