Hacker News new | ask | show | jobs
by joshpen188 3353 days ago
Why didn't you use common crawl instead?
1 comments

For our purposes Common Crawl's corpus was missing too many websites (possibly due to robots.txt configs of websites) Also we needed some deep coverage which CC could not provide.