|
|
|
|
|
by arnaudsm
663 days ago
|
|
I don't buy this number. Text-only common crawl is 20TB. Remove spam and dupes, you're around <10TB of current useful data. Which you can parse and index on a single server nowadays. It's the full Google index history with full HTML that is probably 12PB, but the useful part of the search engine is much smaller. |
|