|
|
|
|
|
by rli
4931 days ago
|
|
robots.txt can be ignored, it's just a reference for honest spiders. I think the way described above, of listing top requestors, doing statistics and then automating blocking is indeed the best way. Could also be there's a blocklist or two around of malicious scrapers. And if there isn't, that's a new business proposal. |
|