Hacker News new | ask | show | jobs
by dijit 3416 days ago
You're taking the wrong approach.

We don't have auto-healing HA, you have 32 master databases which have replica databases underneath them with synchronous replication. Meaning things must be synced to both before the COMMIT OK is received by the client.

Then you do the sharding logic in the application.

No write can be sent back as being "OK" unless it's on disk on 2 servers which represent a vertical slice of our entire database structure.

We assume power-loss scenarios mostly, which means if it's on disk and not in vfs then we're fine- as power-loss is more likely than complete raid degradation or server disappearance, although the replicas help with that too.

You don't need quorum at all in this scenario, and no matter which database or client fails you will not lose data that you've acknowledged, even on immediate power loss to 50% of your entire infra.

1 comments

So what do you do when the master for one of your shards fails? Do you drop all incoming writes for the shard on the floor? Or do you fail over to the shard's slave and promote the slave to the new master?

Since you said you don't have "auto healing HA" I assume you don't fail over, but discard/deny incoming writes until the master comes back up.

This is a valid approach, but, I don't see how it contradicts what I said at all:

  - I said you can't get full ACID and HA failover at the same time with postgres
  
  - Your scheme does not provide HA failover

I explicitly said that if you can forgo either full ACID or HA in case of a failure, postgres is fine.