Hacker News new | ask | show | jobs
by omoikane 817 days ago
I have examples in my logs of GPTBot fetching only /robots.txt, and nothing from the same /24 block fetched anything else after that, so it seems at least that bot respects robots.txt.

Maybe your question is "how do we know if whatever system GPTBot feeds downstream didn't just get your content via something else that crawl your site?" I am not sure we have anything to defend against those, other than signalling via robots.txt to say that our content is not intended for AI use.