Hacker News new | ask | show | jobs
by fragmede 483 days ago
the common crawl dataset is rather massive, though I can't speak to how well it would perform here

http://commoncrawl.org