| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pjungwir 3988 days ago

Since most of the comments are critical, I'll say: thank you for the awesome writeup! I agree this is more complex than HA PG setups I've done in the past, but I'm thrilled to have another perspective. Also doing a thorough writeup like this takes time, and a lot of people would rather jump back into building the next thing. It's a great contribution!

I agree with pilif that you almost always want to failover the db manually.

I agree with teraflop that just because etcd gives strong guarantees, that doesn't mean your application logic built on top of etcd primitives shares them. So you have to be careful about your reasoning there.

I'm curious if you're doing anything to mitigate haproxy being a single point of failure?

One thing I've had to fix in other people's HA PG setups is ease of getting back to HA after a failover. You lose the master and promote the slave, and now you've just got a master. Ideally it should be easy to just launch another db instance and everyone keeps going. I think this setup achieves that, and that's great!

3 comments

gshx 3988 days ago

Agree and that's a great point about human failover. It can become a challenge for distributed databases running on a large number of instances (like bigtable) but if we're talking only about master HA, then yes, that can still do with human intervention though automation is still preferable. For smaller db setups, much easier to just let a human/dba intervene.

link

mrkurt 3988 days ago

Smaller DB setups rarely have the ops/DBA support required to do manual failover. I think having an as-consistent-as-feasible, automatic failover is something of a default expectation for databases these days, at any size.

link

Xorlev 3988 days ago

You need a larger team to do automatic failover because getting it right is a massive PITA. Either that or pay someone to do it right for you, e.g. RDS, managed solutions.

Manual failover is often a lot safer, automatic systems have a nasty habit of not doing what you expect them to and trashing your database / losing data.

link

gshx 3988 days ago

I should have clarified that I meant small in the context of less scaled out and more vertically scaled like the traditional rdbms running on big iron.

link

winsletts 3988 days ago

Take a look at the code we have open sourced: https://github.com/compose/governor

link

ozgune 3988 days ago

Great write-up, thanks for sharing!

I'm curious about HAProxy being a single point of failure as well. What happens when it fails?

link