Hacker News new | ask | show | jobs
by lucas_crocker 767 days ago
Also, does anyone have anything I missed? I know some people were suggesting not letting google index everything by updating your robot.txt file.
1 comments

robots.txt isn't an ideal way of preventing pages being indexed.

X-robots http headers are more reliable: https://developers.google.com/search/docs/crawling-indexing/...

Regarding AI, it's a bit more tricky since it isn't going to abide by your rules. Cloudflare have tools: https://blog.cloudflare.com/ai-bots/

How effective these are though, IDK?

Oh that's interesting. Yea, paywalls might end being the most effective way of preventing AI bots, but even those could be circumvented. It's a tricky problem for sure.