Hacker News new | ask | show | jobs
by CableNinja 1041 days ago
I chose to use an nginx entry, because i also dont trust them to follow robots.txt. Throwing a 410 Gone should keep them from coming back too, theoretically, assuming they actually eject when receiving it, like it should.

`if ($http_user_agent ~* ".*?(GPTBot|AI).*?") { return 410; }`

Its not perfect, but it should filter them indefinitely, will probably have to add some more terms in there over time.

1 comments

That's relying on the user agent, though. That's not a trustworthy enough signal for me. For one, crawlers can use any user agent string they like. For another, I don't know what all the possible user agent strings are.