Hacker News new | ask | show | jobs
by jeffbee 2145 days ago
There are currently zero governments and only about 4 commercial entities possessing a datacenter large enough to do the job.
2 comments

https://en.wikipedia.org/wiki/Utah_Data_Center

> designed to store data estimated to be on the order of exabytes or larger

While google says:

> The Google Search index contains hundreds of billions of webpages and is well over 100,000,000 gigabytes in size

https://www.google.com/search/howsearchworks/crawling-indexi...

Seems like the government already has the expertise and equipment required to handle data at Google scale :)

Having disk space is only a small part of being able to make use of something like a search index. Arguably the least difficult part.

One of your suggestions was

> That's why the stipulation that other entities be allowed to mirror the index - they can optimize the index for their own purposes and rankings on their own hardware.

And the point is that there's nobody who can do this outside of Google, Microsoft (who also does), Facebook, and Amazon.

Not to mention the problems of actually getting the data. You're at the scale of data where trucks of disks are faster data transfer than cables unless you have direct fiber backbone connections.

99% agreement, except for the amerocentric viewpoint. I think it is likely that Baidu has the scale.
I knew I was missing someone. Yes, Baidu (who also already runs a large search index) could probably do the same thing.
A giant building full of hard drives is NOT a Google datacenter.

You're just comparing two storage numbers without taking anything else that running Google at a global scale requires.

The NSA also monitors most of the worlds communications in near real time. A building full of hard drives is also completely useless without some sort of reasonably decent search and indexing capability, so I'm pretty sure the NSA built something for the task.
Doesn't the NSA have a bunch of massive datacenters?

Regardless of that, the datacenter can be appropriated by the government too.

No, the NSA has one facility that would be a small/medium datacenter in the big league, but only if you assume that the NSA is as efficient as Google, which is a bit of a stretch.

NSA Utah: 65 MW Google Pryor, Oklahoma: 340 MW

Megawatts are indicative of compute load, not storage load. I can definitely believe that Google is doing more compute than NSA, but that sounds more like a difference of need, not of ability.
What do you think the query pipeline looks like?

I can assure you that it's not mapping each query down to a single-sector disk read off an inverted index.

?

I think the query pipeline for NSA (relative to the scales of Google's query pipeline) looks like absense-of-query-pipeline. Hence NSA using less compute and thus (the reasoning goes) less power consumption.

Presumably that Google data center does a lot of compute intensive, non search related stuff - like GCP for one.
Another thing that people persistently misunderstand is the scale relationship between GCP and the rest of Google.
Even if all you say is true and it is truly impossible for the government to replicate any of what Google does, the point is moot. If the government is going to appropriate Google's index, might as well appropriate the datacenters too. Really, whats Google going to do with them once search is gone? According to you, it is the only thing they have have running there.
Might as well appropriate the engineers, too, and chain them to their desks and force them to keep everything working.
It also does a lot of storage intensive, non search related stuff, like Google Cloud Storage.

Every Google data center does...everything.