Hacker News new | ask | show | jobs
by dudus 450 days ago
This won't stop the big ones, Google, Meta, OpenAI, Perplexity, or even the Chinese Govt. But it will make it harder for new entrants.
1 comments

Not sure it’s targeted at them, either. Which of those entities have misbehaving bots? Seems like Google, at least, should be following robots.txt?
Google 2025 is not the Google you remember and respect.

"GoogleAssociationService bot was kind enough to ask 1,000,000+ times yesterday for the same file from 4000+ Google IP addresses. Answer was the same 404 - File Not Found. The User-Agent does not provide a support link unlike their other bots." -- https://en.osm.town/@osm_tech/114205536438977922

Google absolutely does run "misbehaving bots", and has all the world renowned user support it's well know for from the teams running them, which means your best - perhaps only- option is to firewall off all Google ASNs.

With Google search's decline in usefulness and it's plummeting referral traffic, combined with their unashamed AI-grifting copyright infringement and IP theft, the tradeoff in the old thinking of "I need to let Google crawl my site because I still naively believe SEO will make my business successful" is rapidly moving towards "Fuck you Google, you don't get anything I publish for free anymore."

Thanks for the link. Apparently Google has more bots than I thought. But is that really a Google bot or is someone else using their name? I don't see 'GoogleAssociationService' listed in their documentation [1].

They do say it’s from Google IP addresses, but it might be someone running a bot in Google Cloud? Maybe they checked that, but we can’t tell from a tweet.

Seems like a reasonable approach might be to whitelist the documented Google bots and block others.

[1] https://developers.google.com/search/docs/crawling-indexing/...