Hacker News new | ask | show | jobs
by wz1000 2145 days ago
https://en.wikipedia.org/wiki/Utah_Data_Center

> designed to store data estimated to be on the order of exabytes or larger

While google says:

> The Google Search index contains hundreds of billions of webpages and is well over 100,000,000 gigabytes in size

https://www.google.com/search/howsearchworks/crawling-indexi...

Seems like the government already has the expertise and equipment required to handle data at Google scale :)

2 comments

Having disk space is only a small part of being able to make use of something like a search index. Arguably the least difficult part.

One of your suggestions was

> That's why the stipulation that other entities be allowed to mirror the index - they can optimize the index for their own purposes and rankings on their own hardware.

And the point is that there's nobody who can do this outside of Google, Microsoft (who also does), Facebook, and Amazon.

Not to mention the problems of actually getting the data. You're at the scale of data where trucks of disks are faster data transfer than cables unless you have direct fiber backbone connections.

99% agreement, except for the amerocentric viewpoint. I think it is likely that Baidu has the scale.
I knew I was missing someone. Yes, Baidu (who also already runs a large search index) could probably do the same thing.
A giant building full of hard drives is NOT a Google datacenter.

You're just comparing two storage numbers without taking anything else that running Google at a global scale requires.

The NSA also monitors most of the worlds communications in near real time. A building full of hard drives is also completely useless without some sort of reasonably decent search and indexing capability, so I'm pretty sure the NSA built something for the task.