Hacker News new | ask | show | jobs
by mw44118 4296 days ago
There's no install instructions in the .zip file, but there is a github wiki. But that has obsolete instructions. But there's an open issue, and the comments for that issue include fixes to that.

I stopped messing with it once I saw how they wanted me to install something into root's crontab.

My hypothesis: postgresql has none of these jazzy monitoring tools because it never crashes.

Sure, as traffic grows, slow queries need to be optimized, crappy N+1 ORMs need to be replaced, and boxes need more RAM and faster disks, but when was the last time you heard of anyone complain about postgresql or blame it for any downtime?

3 comments

/snark I've certainly had PostgreSQL crash and all it took was someone in a Verizon datacenter accidentally yanking out the cables connecting the server to its RAID arrays. It even crashed a second time when someone "tripped over a cable" and took out the arrays again and a third time when the redundant power supplies turned out not to be...

OTOH complete loss of storage resulted in sub-second data loss with autorecovery the first two times using a five year old Pg installation and minimal issues the third time. The master server had a three year uptime on the main server process when it went down. Overall I'm pretty comfortable with that track record.

I want to be able to scale out by adding more machines. I want to be able to failover automatically to another data center when the first one goes down. I have yet to see a straight forward way to accomplish this with PG. Their wiki lists a bunch of tools related to this but they are either abandoned or does not cover this as far as I can tell. I can't understand all the positive I read about PG..
> I want to be able to failover automatically to another data center when the first one goes down. I have yet to see a straight forward way to accomplish this with PG

From personal experience (also from others: https://github.com/blog/1261-github-availability-this-week), I don't think you want to failover a database automatically.

Of all the issues that could cause a machine to decide that a failover is needed, most of the root causes make a failover actually non desirable (a hardware failure for example (failover is good) is way less likely than non-reachability due to load (failover is disastrous) and unless you are very careful, an automated solution will act the same way in both cases.

Add to that the huge cost of failing back during which time there's no more slave to fail over to: Until 9.4 ist released, failing back requires you to file-system-level copy all the data back to the failed master to bring it back up as a slave.

After 9.4, re-synchronizing an old failed master to the new master will actually be possible in most cases (a mistaken failover is usually covered by these).

In case of an emergency, first make sure that a failover would actually help (if you're down because of high load and a misconfiguration of your system, failing over won't help, but will only make things worse), then fail over manually.

As I said, there are way fewer possible emergencies where failing over would help compared to many, many more where failing over would actually cause more damage.

This is valid for all non master-master database configurations I've had to deal with so far, but, again, it's even more pronounced with postgres because of the very time (and bandwith - which could mean "costly" when you cross the public internet) consuming failback (during which you have nowhere to fail over to again).

If you really, really want to do it, have a look at pgpool (http://www.pgpool.net/) which can automatically fail over to a slave and which also is able to read-load-balance between one or multiple slaves. It's quite the out-of-the-box solution.

I'd say the way failover/HA is done at Heroku is straightforward. Will (the designer) or I plan to write about it some day, pending laziness.

It took some time to figure out because it required breaking some orthodoxy, but I'm happy with the result.

The thresholds are documented at https://devcenter.heroku.com/articles/heroku-postgres-ha#fai....

The promotion is done by rebinding the URL of the database and restarting the app. This shares mechanism neatly with changing of passwords, which is one reason we decided it was worth throwing out network-transparent orthodoxy when it came to HA: the clients must be controlled anyway to deal with security considerations.

One approach I used when we migrated data centres to avoid having to manage timing etc. of IP address changes in our apps, was to use haproxy.

We configured slaves in the new data centre, set up a haproxy in the new data centre that used the old data centre databases as the main backend, and the new data centre databases as the backup, changed the apps to point to haproxy, shut down the masters and let haproxy shift all the traffic, promoted the slaves once we were certain they'd caught up. We had a period of seconds where the sites were effectively read-only, but that was it.

We're planning rolling out haproxy with keepalived or ucarp to mediate between a lot more of our backend services

> I'd say the way failover/HA is done at Heroku is straightforward.

> It took some time to figure out because it required breaking some orthodoxy

If it's hard to figure out, it does not sounds too straight forward.

Only because the idea of manipulating clients of the database to do HA is somewhat heretical.
Just wanted to put in a vote for a writeup. Mind sharing a few more details here in the meantime?
Postgres has had hot standby replication since 9.1, along with streaming replication.

https://wiki.postgresql.org/wiki/Hot_Standby

As for scaling horizontally at the db level that can be achieved with foreign data wrappers calling out, but isn't built in.

While Postgres doesn't rival oracle and ms SQL server in feature checkboxes it's a very solid DBMS with many advanced SQL features, and it's free and open source. You can do a lot without hitting the scaling problems.

To be honest, I'm pretty tired of the "You can do a lot without hitting the scaling problems.".

I hear it all the time. Of course, if you don't have a scaling problem you don't. But when you have, you need a solution to it.

Well, in the world of free software you have a choice - an easily-scaled database system that offers you few guarantees and is very hard to code safely against, or one that is harder to scale but easier to code safely against. That doesn't make PG bad - it's just a choice you make. If you want to pay, you can get a little bit closer to having both.

Of course you hear lines like you quoted all the time - that's because it's true. The overwhelming majority of applications don't have needs beyond a master-slave pair of DB systems - and trends are moving in favour of that direction every day as RAM gets cheaper. Stack Overflow runs on a single not-that-beefy master-slave DB system.

Typical web applications are insanely amenable to read caches, and that's where you do most of your scaling. If you're at the point where your scaling needs truly exceed a large single DB system, I'd hope you have some money to throw at the problem.

Imagine my users table has a unique constraint on email address. Or my hotel_room_reservations table has to guarantee I never store overlapping reservations for the same room. In either case, on insert and on update, I need to check the new or updated row vs a bunch of other rows.

Those scenarios make horizontal scaling not so simple. Sure, get rid of those constraints, and you can insert as fast as your network can ship the bytes. But that's not a database anymore. That's a glorified flat file.

I haven't tried it yet, but Postgres XL is pretty new, and promises this. http://www.postgres-xl.org/
No it doesn't. "In terms of automatic failover, it is currently not part of the core project".
I agree that the documentation still need a lot of work. However, it doesn't say nor suggest to setup any crontab as root. As the only requirement is to have a stored procedure called regularly, you can use local postgres crontab for example, or any other place else. I will clarify this point on documentation as soon as possible.
Thanks!