Hacker News new | ask | show | jobs
by wslh 437 days ago
I think those crawlers are just very generic: they basically operate like wget scripts, without much logic for avoiding sites that already offer clean data dumps.
1 comments

That is not an excuse. Wikipedia isn't just any site.
Not an excuse, a plausible explanation of what's actually happening.
Also plausibly they are trying to kill the site via soft ddos. Then they can sell a service based on all the data they scraped + unauditable censoring.