|
|
|
|
|
by easy_rider
4582 days ago
|
|
You can implement some strict enforcing in Apache using some crafty mod_rewrite stuff:
http://andthatsjazz.org/defeat.html User-agent is to easily spoofed, but we could check if the robots are indeed Google (whitelisted) and not some other crawler that just wants to scrape your content. In the realm of mail servers we have something called SPF:
http://en.wikipedia.org/wiki/Sender_Policy_Framework Just thinking out of the box here, but other than checking IP ranges: Maybe a hash being sent as a header inside the GET request by the crawler to verify if they are who they say they are. |
|