| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rli 4978 days ago
	robots.txt can be ignored, it's just a reference for honest spiders. I think the way described above, of listing top requestors, doing statistics and then automating blocking is indeed the best way. Could also be there's a blocklist or two around of malicious scrapers. And if there isn't, that's a new business proposal.