|
|
|
|
|
by nulld3v
409 days ago
|
|
> I doubt many people are doing things to allow Googlebot but also ban other search crawlers. Sadly this is just not the case.[1][2] Google knows this too so they explicitly crawl from a specific IP range that they publish.[3] I also know this, because I had a website that blocked any bots outside of that IP range. We had honeypot links (hidden to humans via CSS) that insta-banned any user or bot that clicked/fetched them. User-Agent from curl, wget, or any HTTP lib = insta-ban. Crawling links sequentially across multiple IPs = all banned. Any signal we found that indicated you were not a human using a web browser = ban. We were listed on Google and never had traffic issues. [1] https://onescales.com/blogs/main/the-bot-blocklist [2] Chart in the middle of this page: https://blog.cloudflare.com/declaring-your-aindependence-blo... (note: Google-Extended != Googlebot) [3] https://developers.google.com/search/docs/crawling-indexing/... |
|