Hacker News new | ask | show | jobs
by g0del_was_wr0ng 3416 days ago
No, but when your requirement is to "never loose a write regardless of which server fails" you have to make sure the mutation was acknowledged by more than one server.

So the least you have to do is to wait for a second server to confirm the write. But now what do you do when the network link between the two servers goes down in-between a write? The write might have already been applied on the first server or it might not have. The same goes for the second server. But you can't be sure either way.

For our thought experiment, let's assume that while the servers can't talk to each other anymore, some clients can still talk to each of the servers.

At this point, you either have to give up availability (i.e. don't accept any further writes until the fault is resolved) or consistency (i.e. potentially return unacknowledged/stale data from one of the nodes until the fault is resolved).

While returning unacknowledged/stale data is acceptable for some usecases, it does break ACID semantics. You can't get both ACID and "always-on" availability with a simple replication scheme like this.

If you want linearizability and HA, you need at least three servers and a more complex, quorum-based scheme. However, Postgres doesn't support that (AFAIK) - you have to use something like Zookeeper or CockroachDB. I think this is what zzzcpan meant.

(I realize that you only spoke about never loosing a write and didn't say anything about updates/queries. If you don't care about ACID, postgres with synchronous replication is a good highly available solution)

1 comments

You're taking the wrong approach.

We don't have auto-healing HA, you have 32 master databases which have replica databases underneath them with synchronous replication. Meaning things must be synced to both before the COMMIT OK is received by the client.

Then you do the sharding logic in the application.

No write can be sent back as being "OK" unless it's on disk on 2 servers which represent a vertical slice of our entire database structure.

We assume power-loss scenarios mostly, which means if it's on disk and not in vfs then we're fine- as power-loss is more likely than complete raid degradation or server disappearance, although the replicas help with that too.

You don't need quorum at all in this scenario, and no matter which database or client fails you will not lose data that you've acknowledged, even on immediate power loss to 50% of your entire infra.

So what do you do when the master for one of your shards fails? Do you drop all incoming writes for the shard on the floor? Or do you fail over to the shard's slave and promote the slave to the new master?

Since you said you don't have "auto healing HA" I assume you don't fail over, but discard/deny incoming writes until the master comes back up.

This is a valid approach, but, I don't see how it contradicts what I said at all:

  - I said you can't get full ACID and HA failover at the same time with postgres
  
  - Your scheme does not provide HA failover

I explicitly said that if you can forgo either full ACID or HA in case of a failure, postgres is fine.