Hacker News new | ask | show | jobs
by Smerity 4418 days ago
For only one million web pages, the job would likely be quite cheap. The Common Crawl corpus is hundreds of millions of pages and, given the right setup, only takes $10 to $100 to process, especially for relatively light entity extraction. More expensive operations, such as parsing using NLP tools, will obviously be more expensive.