| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nulld3v 409 days ago

> I doubt many people are doing things to allow Googlebot but also ban other search crawlers.

Sadly this is just not the case.[1][2] Google knows this too so they explicitly crawl from a specific IP range that they publish.[3]

I also know this, because I had a website that blocked any bots outside of that IP range. We had honeypot links (hidden to humans via CSS) that insta-banned any user or bot that clicked/fetched them. User-Agent from curl, wget, or any HTTP lib = insta-ban. Crawling links sequentially across multiple IPs = all banned. Any signal we found that indicated you were not a human using a web browser = ban.

We were listed on Google and never had traffic issues.

[1] https://onescales.com/blogs/main/the-bot-blocklist

[2] Chart in the middle of this page: https://blog.cloudflare.com/declaring-your-aindependence-blo... (note: Google-Extended != Googlebot)

[3] https://developers.google.com/search/docs/crawling-indexing/...