Hacker News new | ask | show | jobs
by rli 4931 days ago
robots.txt can be ignored, it's just a reference for honest spiders. I think the way described above, of listing top requestors, doing statistics and then automating blocking is indeed the best way. Could also be there's a blocklist or two around of malicious scrapers. And if there isn't, that's a new business proposal.