Hacker News new | ask | show | jobs
by z3t4 3334 days ago
In theory, say you want to index one billion (10^9) web sites. Using modern hardware, you should be able to crawl, 10,000 web pages per second, which would take ca 30 hours, and if you save 1kb of text from each web site, that would be ca 1 TB of data. Doing a text search of 1TB of text would take some time though, maybe minutes. You could partition the data between servers though.