Hacker News new | ask | show | jobs
by darekkay 359 days ago
ai.robots.txt contains a big list of AI crawlers to block, either through robots.txt or via server rules:

https://github.com/ai-robots-txt/ai.robots.tx

2 comments

This actually blocks a lot more than just AI crawlers. You shouldn’t use this without reviewing it in detail so that you understand what you are actually blocking.

For instance, it includes ChatGPT-User. This is not a crawler. This is used when a ChatGPT user pastes a link in and asks ChatGPT about the contents of the page.

One of the entries is facebookexternalhit. When you share a link on Facebook, Threads, WhatsApp, etc., this is the user-agent Meta uses to fetch the OpenGraph metadata to display things like the title and thumbnail.

Skimming through the list, I see a bunch of things like this. Not every non-browser fetch is an AI crawler!

Your link is missing the t at the end of .txt. You should be able to edit it though.