Hacker News new | ask | show | jobs
by cullumsmith 45 days ago
I simply block all AI crawlers with a user-agent check in nginx.conf.
4 comments

I also block all AI crawlers. I am not sure why I should give them my content for them to rip it off and make money from it through training or agents. Sadly, a lot of AI companies are trying to make requests indistinguishable from regular browsers from residential connections, so unfortunately I have to use Cloudflare to block them.

Ideally I'd make the content available to crawlers for training open models, but that seems to be nearly impossible. It would be possible if other AI companies behaved.

>so unfortunately I have to use Cloudflare to block them.

That can’t block Grok, can it?

(You might have a fake iPhone or something visit your site if you ask Grok to retrieve information from it)

What's the IP address of the supposed iPhone? Does it come from T-Mobile or from xAI?
Residential I thought? It might’ve been even someone on here who posted about watching their server logs while they messaged Grok themselves.

Curious if xAI has a phone farm. Maybe just running simulators on servers?

Residential proxies are a commodity at this point. You can also run your own network and try to get it misclassified as residential.
This works for a few weeks to months. Then they detect your site is hostile to them and enable evasion mode, with random IP addresses and user-agent strings. Proxies are expensive so at least they're losing money.
*some AI crawlers. Not many
I started blocking some of them. But for now I want to improve visibility before further blocking or optimising. The dashboard helps with this.