Hacker News new | ask | show | jobs
by alecbaldwinlol 3688 days ago
I imagine maybe 10x more than this. The numbers actually aren't too shocking if you think of the mirroring of data that's required for maximum uptime (not even archival), and the occasional disk failure.

1,000 TB (1PB) can be easily handled across ~150 (6-7TB)HDDs for one copy, but 300-450 HDDs would be required for additional mirroring.

Largest tape cartridges out there are between 6 and 8.5 TBs, and cost around $22 per TB. That's only $22,000 per PB, and this is for high throughput cartridges like LTO7 or StorageTek Titanium. LTO5 is much cheaper.

Considering that the largest tech companies and major organizations routinely cut POs for several $100ks and are dealing with 100s of PBs of data across disk, tape, DVDs etc, it isn't outside the realm of possibility to have 300,000+ individual disks and tapes floating out there :)

3 comments

Anecdotally, Google's file system Colossus uses Reed-Solomon 1.5x replication. So those 150 drives might only turn out to be in the low 200's.

And I remember reading a tweet from a Google engineer that they would be paged if their free storage dropped below 5PB.

I thought waking up at 3am, shambling to my desk and connecting to the VPN was bad. Imagine having to drive down to the datacenter and rack 200 hard drives.
Sweet, I appreciate that info :) I imagine 5PB fills up quite quickly for Google too!

At that scale, it's just a function of (number of ethernet cables) x (avg size of ethernet cables), rather than disk space in their data center, I'd imagine!

> 1,000 TB (1PB) can be easily handled across ~150 (6-7TB)HDDs for one copy, but 300-450 HDDs would be required for additional mirroring.

Mirroring is not used at that scale

https://www.backblaze.com/blog/reed-solomon/

https://blogs.msdn.microsoft.com/windowsazurestorage/2012/06...

That's all good, it would just increase the number of disks required to be purchased and the amount of electricity/cooling/floorspace to maintain them. It would add to the disk count, but not really affect cost per TB all that much.

How do cloud storage vendors guarantee triple-mirroring and uptime then? Lots of 2TB drives? Lies? :)

Does it say triple mirroring or triple redundancy? Take a look at the article from Backblaze for an overview of the math. In their case it might be called quadruple redundancy. 20 shards hold 17 shards of data with triple parity. Events destroying hard drives containing information about your data could happen three times, and you still wouldn't lose anything.
> 1,000 TB (1PB) can be easily handled across ~150 (6-7TB)HDDs for one copy

Yea, our latest storage pod (https://www.backblaze.com/blog/open-source-data-storage-serv...) has 60 drives at about 8TB a piece so we're pushing 480TB. Two pods are about a Petabyte, if you go up to the 16TB Hard Drives some of the manufacturers are testing, you can hit pretty close to 1PB in an enclosure, and Dropbox is actually doing that already with their "Diskotech" boxes (HN link -> https://news.ycombinator.com/item?id=11282948) - so folks are already getting more and more dense :D