Hacker News new | ask | show | jobs
by rocky_raccoon 818 days ago
Not that I'm arguing for or against preventing access from AI crawlers, but wouldn't it make more sense to block them at a higher level, e.g. the webserver, and not even give them the choice to obey/disobey robots.txt?
1 comments

How would you propose doing so?
Off the top of my head:

- Cloudflare

- Webserver-level user-agent blocking (Apache, nginx)

- Application-level user-agent blocking (`if request.user_agent == 'OpenAI'`)

None of them are ideal since you can simply change your user agent, but all of them seem like better options than robots.txt to me.

We could repurpose the evil bit.
One second, let me google this.

e: Okay, this is funny.

Web servers can check the user-agent and block the request.

E.g. nginx $http_user_agent