Hacker News new | ask | show | jobs
by jiggawatts 185 days ago
So if you put a 3-way cluster in the same building and they lose power together, then what? Is your data toast?
2 comments

If I make certain assumptions and you respect them, I will give you certain guarantees. If you don't respect them, I won't guarantee anything. I won't guarantee that your data will be toast either.
If you can't guarantee anything for all the nodes losing power at the same time, that's really bad.

If it's just the write buffer at risk, that's fine. But the chance of overlapping power loss across multiple sites isn't low enough to risk all the existing data.

I disagree that it's bad, it's a choice. You can't protect against everything. The team made calculations and decided that the cost to protect against this very low probability is not worth it. If all the nodes lose power you may have a bigger problem than that
Power outages across big areas are common enough.

It's downright stupid if you build a system that loses all existing data when all nodes go down uncleanly, not even simultaneously but just overlapping. What if you just happen to input a shutdown command the wrong way?

I really hope they meant to just say the write buffer gets lost.

That's why you need to go to other regions, not remain in the same area. Putting all your eggs in one basket (single area) _is_ stupid. Having a single shutdown command for the whole cluster _is_ stupid. Still accepting writes when the system is in a degraded state _is_ stupid. Don't make it sound worse than it actually is just to prove your point.
> Still accepting writes when the system is in a degraded state _is_ stupid.

Again, I'm not concerned for new writes, I'm concerned for all existing data from the previous months and years.

And getting in this situation only takes one out of a wide outage or a bad push that takes down the cluster. Even if that's stupid, it's a common enough stupid that you should never risk your data on the certainty you won't make that mistake.

You can't protect against everything, but you should definitely protect against unclean shutdown.

It sounds like that's a possibility, but why on earth would you take the time to setup a 3 node cluster of object storage for reliability and ignore one of the key tenants of what makes it reliable?