|
|
|
|
|
by 3pt14159
3954 days ago
|
|
Hey man if I'm following your robots.txt it's all good right? You have the power to only allow the people that give you benefit: User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Slurp
Allow: /
User-Agent: bingbot
Allow: /
|
|
Anyone can make a crawler and then have it report as Googlebot. That doesn't even violate the robots.txt; it says, if your name is Googlebot, you're allowed.
Blocking crap requires cunning: code that looks for suspicious access patterns and responds.
A genuine Googlebot should be operating from a Google domain. If we reverse the client IP of a Googlebot request, we get something in the ".googlebot.com" domain.