Hacker News new | ask | show | jobs
by DanWaterworth 3878 days ago
When you have another 60 HDDs in the same case, I guess you think about reliability differently.
1 comments

The 60 HDDs aren't single points of failure. {Edit: I mean for the server, not for the whole system.}

And a raid1 pair of HDDs for the system disk is more expensive than a small SSD, more fussy, and the SSD is still less likely to totally fail.

It doesn't sound like the boot drive is a single point of failure since data is stored Reed-Solomon coded in chunks across many pods. If one data drive fails, the whole pod has to go down for maintenance for it to be replaced. The only difference is that you get to choose when to take a pod down for maintenance to replace a data drive.
That's the situation I was talking about, yes. It's much cheaper to go into the datacenter once a month to replace failed data disks than to have to go in to promptly replace any system disk HDD that fails, in order to not have 60 idle disks.
Sure, that would make sense if they only had personnel in the datacenter once a month, but: "we replace about 10 drives every day" [1].

[1] https://www.backblaze.com/blog/vault-cloud-storage-architect...

To clarify, I'm not saying that they wouldn't be better off with an SSD boot drive. I'm arguing that having an HDD boot drive, given their setup, isn't awful.