Hacker News new | ask | show | jobs
by bborud 5263 days ago
Where does to 200Gb figure come from? I was quite busy building a web crawler too at the time and I can distinctly remember that our crawlers had about 17Tb of storage. So let's say we had crawled something like 15Tb of data to get a meaningful sample of the web.

I agree with the gist of the blog posting though.

1 comments

In http://www.salon.com/1998/12/21/straight_44/ it said "Page says the current version of Google, which has indexed about 60 million pages, will continue to be improved as the company expands." and http://en.wikipedia.org/wiki/History_of_Google#cite_note-sal... said Total indexable HTML urls: 75.2306 Million Total content downloaded: 207.022 gigabytes.