Hacker News new | ask | show | jobs
by cleverjake 4973 days ago
I don't understand how these pages could have been crawled - could someone enlighten?
1 comments

It's seems that Facebook uses robots.txt to block this pages

https://www.facebook.com/robots.txt

But, depending of the amount of inbound links, Google will index the urls anyway.

It's a common issue.

Google ignores robots.txt if the number of inbound links is > N?

Also - any speculation as to how so many sites were lining to peoples login pages?