I think those crawlers are just very generic: they basically operate like wget scripts, without much logic for avoiding sites that already offer clean data dumps.
Also plausibly they are trying to kill the site via soft ddos. Then they can sell a service based on all the data they scraped + unauditable censoring.