| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cube00 3351 days ago
	Sure, go join the realms of shady SEOs and malware, if I want to really stop you I'll know you're not coming from a Google IP range. https://www.incapsula.com/blog/was-that-really-a-google-bot-... However, consider what your ultimate end game is, if it's a website you expect visitors to find through Google or the Play store, good luck once web masters start reporting your misbehaving "Googlebot" crawler.

2 comments

lend000 3351 days ago

Unless you do it from a Google Cloud instance, that is.

link

mootothemax 3351 days ago

>Unless you do it from a Google Cloud instance, that is.

What's the reverse DNS for Google Cloud IPs? Google says to check that Googlebot's IP resolves to either a .google.com or .googlebot.com domain.

https://support.google.com/webmasters/answer/80553?hl=en

link

milankragujevic 3351 days ago

Couldn't you use GWT Mobilizer to scrape a site then index that?

Like this: http://i.imgur.com/ocR54Yq.jpg

link

lend000 3351 days ago

Good point -- although it makes sense why it isn't frequently implemented. DNS lookups aren't cheap for this kind of thing.

link

Symbiote 3351 days ago

It would be sufficient to let some requests come through from "Googlebot", and then deal with them (block, rate-limit, whatever) once the DNS check has been completed.

link

hrrsn 3351 days ago

.googleusercontent.com

link

walshemj 3351 days ago

You know you have to be able to crawl your own or client sites using screaming frog or deep crawl as google to identify any crawl/ crawl budget issues.

link