Hacker News new | ask | show | jobs
by cube00 3351 days ago
Sure, go join the realms of shady SEOs and malware, if I want to really stop you I'll know you're not coming from a Google IP range. https://www.incapsula.com/blog/was-that-really-a-google-bot-...

However, consider what your ultimate end game is, if it's a website you expect visitors to find through Google or the Play store, good luck once web masters start reporting your misbehaving "Googlebot" crawler.

2 comments

Unless you do it from a Google Cloud instance, that is.
>Unless you do it from a Google Cloud instance, that is.

What's the reverse DNS for Google Cloud IPs? Google says to check that Googlebot's IP resolves to either a .google.com or .googlebot.com domain.

https://support.google.com/webmasters/answer/80553?hl=en

Couldn't you use GWT Mobilizer to scrape a site then index that?

Like this: http://i.imgur.com/ocR54Yq.jpg

Good point -- although it makes sense why it isn't frequently implemented. DNS lookups aren't cheap for this kind of thing.
It would be sufficient to let some requests come through from "Googlebot", and then deal with them (block, rate-limit, whatever) once the DNS check has been completed.
.googleusercontent.com
You know you have to be able to crawl your own or client sites using screaming frog or deep crawl as google to identify any crawl/ crawl budget issues.