| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wslh 485 days ago
	I think those crawlers are just very generic: they basically operate like wget scripts, without much logic for avoiding sites that already offer clean data dumps.

1 comments

ldng 485 days ago

That is not an excuse. Wikipedia isn't just any site.

link

wslh 485 days ago

Not an excuse, a plausible explanation of what's actually happening.

link

franktankbank 485 days ago

Also plausibly they are trying to kill the site via soft ddos. Then they can sell a service based on all the data they scraped + unauditable censoring.

link