Hacker News new | ask | show | jobs
by jsheard 635 days ago
It is trivial to detect fake GoogleBot traffic (Google provides ways to validate it) and Cloudflare already does so. See for yourself:

  curl -I -H "User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/105.0.5195.102 Safari/537.36" https://www.cloudflare.com
They'll immediately flag the request as malicious and return 403 Forbidden, even if your IP address is otherwise reputable.
1 comments

Now try it from a google cloud vm.
Pretty sure that won't work, they let you validate whether an IP address is used by GoogleBot specifically, not just owned by Google in general. I doubt they are foolish enough to use the same pool of IP addresses for their internal crawlers and their public cloud.

https://developers.google.com/search/docs/crawling-indexing/...

It depends how the site has implemented it, a huge number just look for AS origination and *googleuserconent.com