|
|
|
|
|
by Scryptonite
4659 days ago
|
|
What if a different robots.txt is being served up for the real Googlebots? EDIT: Based on the comments in their robots.txt it appears that they are whitelisting certain robots. You would have to apply for your robot to crawl their site at https://www.facebook.com/apps/site_scraping_tos.php They probably serve up unique generated robots.txt based on whitelisted robots. You'd never know what rules it contains unless you are a whitelisted robot. |
|
I tested this with: wget --user-agent "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" http://www.facebook.com/robots.txt
and diffed, but the results are the same.