|
|
|
|
|
by fc417fc802
410 days ago
|
|
> then you set traps like disallowing something in robots.txt and then ban anything that tries to access it That doesn't work at all when the scraper rapidly rotates IPs from different ASNs because you can't differentiate the legitimate from the abusive traffic on a per-request basis. All you can be certain of is that a significant portion of your traffic is abusive. That results in aggressive filtering schemes which in turn means permitted bots must be whitelisted on a case by case basis. |
|
Well sure you can. If it's requesting something which is allowed in robots.txt, it's a legitimate request. It's only if it's requesting something that isn't that you have to start trying to decide whether to filter it or not.
What does it matter if they use multiple IP addresses to request only things you would have allowed them to request from a single one?