Hacker News new | ask | show | jobs
by craigmccaskill 5453 days ago
Some quick napkin math on numbers:

If you're to believe Eric Schmidt when he says 'millions' and put that at 3 million users (being generous), guessing that each user uploads 20 megabytes of content (again, generous) thats:

3x10^6 × 20 MB

6x10^7 MB or

60TB

Sanity check or does that not seem like a lot of resources to allocate to a project of this size?

Edit: formatting

5 comments

I don't think this has anything to do with how much content a user uploads. They specifically said that “For about 80 minutes we ran out of disk space on the service that keeps track of notifications.”

So probably they had allocated some amount that made perfect sense while testing, but was too low for the full rollout that seems to (effectively) be happening now.

Quite simply, no. But if it was notifications they were probably going through 'Chubby' [1] which is sometimes used as a scratchpad for notifications. But if you read the interview with Sean Quinlan [2] you can get a sense that 60TB is quite small.

[1] http://labs.google.com/papers/chubby.html

[2] http://queue.acm.org/detail.cfm?id=1594206

Maybe some heavy logging in this field test phase could explain it? Indeed 60TB sounds like very little. I thought Google would use their GFS system to back this kind of things so the disk space wouldn't be an issue. What do I know.
60TB? That's about a couple grand worth of space. Why would that be a lot of space to Google?
Even using 3TB consumer grade SATA disks you'd need 20 of them, an enclosure for 20 disks costs a lot more than a couple of grand.
That's a chassis for 2.5" disks so you'd be looking at 60x1TB disks, and that would mean 3 of those enclosures. Now rack them somewhere and add power - still much more than a couple of grand, we haven't even paid for the disks yet...
Yeah, I was a tad hyperbolic in just referring to the disks. I would expect the costs to be around $20/GB/year when you also factor in power - bigger drives are making a difference, but the other factors always cost more than the disks themselves.

It doesn't change the fact that 60TB is tiny for a company whose every product involves storing enormous quantities of data and serving them at monstrous scale.

And according to the GFS paper their are three copies of every chunk in a GFS cluster so that is 180TB, and they probably don't depend on one GFS cluster to meet their availability guidelines so if you had two that is really 320TB (180TB * 2).

And the amazing part is if you are in an open event where Google is talking about their infrastructure in general terms you will realize that that has to be mouse nuts compared to the amount of 'spinning rust' they have going on at any one time.

Absolutely agree!
Sorry, posted wrong chassis. Look for the SC847, that's 45x 3,5" in 4U. With 32x 2T S-ATA you're looking at roughly $5000 (incl. disks).

And Google likely gets them quite a bit cheaper than that.

Eh, I was definitely lowballing a lot with "a couple grand." You also need to factor in power and replication.
I have on the order of six terabytes knocking around in my house. I think the chocolate factory can store a few orders of magnitude more than me before it starts to sweat.