Hacker News new | ask | show | jobs
by Scryptonite 4659 days ago
What if a different robots.txt is being served up for the real Googlebots?

EDIT:

Based on the comments in their robots.txt it appears that they are whitelisting certain robots. You would have to apply for your robot to crawl their site at https://www.facebook.com/apps/site_scraping_tos.php

They probably serve up unique generated robots.txt based on whitelisted robots. You'd never know what rules it contains unless you are a whitelisted robot.

1 comments

Or your user-agent is the same as one. Unless they are whitelisting IPs.

I tested this with: wget --user-agent "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" http://www.facebook.com/robots.txt

and diffed, but the results are the same.

Is your IP range the same as googles?
Google doesn't release their IP ranges. It can come from any IP.