Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 1368 days ago
Unless the originating IP address is a Google-controlled one, using Googlebot as a User-Agent header is (IME) generally no better than not sending a UA header at all.^1 If the goal is to make a server believe a request is coming from Google, then the request needs to be sent from a publicised Google-controlled IP address.^2

1. For many years I have had great results with not sending a UA header. It is also, IMO, an effective means to discover the true number of websites that refuse to fulfill a request in the absence of a UA header, which IME is extremely small. For that small handful of sites, one can send a "fake" UA header of one's choosing. sec.gov is an example of such a site.

2. http://developers.google.com/static/search/apis/ipranges/goo...

2 comments

Interestingly, lite.duckduckgo.com recently started requiring a User-Agent header, after many years of operating without this requirement. Are there any enforceable limits of what DDG can do with the UA header data. There has been no update to DDG's privacy policy.
I wonder if fake bot detectors can distinguish between any Google IP like GCP instances (i.e. do they simply check the ASN) or crawler specific IPs

Or maybe google crawler also runs on GCP and it's indistinguishable from regular $5 compute users

Yes, Google and most major search engines enable a RDNS lookup to validate they are really a googlebot