Hacker News new | ask | show | jobs
by fragmede 959 days ago
robots.txt is actually a really usefulay to tell an attacker where to look for juicy content that doesn't want to be indexed, but following it entirely voluntary. It's easy to imagine a dark web search engine that only has that content.

If you want your stuff to exist in the same way, but for OpenAI training, just block GPTBot in your robots.txt

https://platform.openai.com/docs/gptbot

1 comments

Just a thought, what about a dummy/honeypot path in robots.txt? If any request is made related to that path, block connections from that source?