|
|
|
|
|
by kazinator
3950 days ago
|
|
robots.txt is widely ignored. User-agent fields are faked out to make robots look like Firefox on Windows. Anyone can make a crawler and then have it report as Googlebot. That doesn't even violate the robots.txt; it says, if your name is Googlebot, you're allowed. Blocking crap requires cunning: code that looks for suspicious access patterns and responds. A genuine Googlebot should be operating from a Google domain. If we reverse the client IP of a Googlebot request, we get something in the ".googlebot.com" domain. |
|