Hacker News new | ask | show | jobs
by ExoticPearTree 520 days ago
I wish someone that has resources to invest into making the replication in PostgreSQL seamless, easy to configure and easy to change masters without anything like pgbounce and friends.

Otherwise, PostgreSQL is fantastic.

12 comments

Pgbouncer is a connection pooler; it has nothing to do with replication.

That said, it’s not that hard to set up replication [0]. Properly tuning the various parameters, monitoring, and being able to fix issues is another story.

RDBMS is hard. MySQL is IMO the easiest to maintain up to a certain point, but it can still bite you in surprising ways. Postgres appears to be as easy on the surface, buoyed by a million blog posts about it, but as your dataset grows, so does the maintenance burden. Worse, if you don’t know what you should be doing, it just eventually blows up (txid wraparound from vacuum failures probably being the most common).

[0]: https://www.postgresql.org/docs/current/runtime-config-repli...

> txid wraparound from vacuum failures probably being the most common

Hopefully OrioleDB can upstream all the necessary changes soon. For those who don't know, it's a storage engine for Postgres that uses undo logs instead of vacuuming old records.

OrioleDB maintains a very small set of Postgres patches that are targeted for upstream. The storage engine that mitigates the need for vacuuming [1] is implemented in a standard Postgres extension, so that will still need to be installed by the Postgres host in order to take advantage of it.

But yeah, looking forward to that day too!

[1] https://www.orioledb.com/blog/no-more-vacuum-in-postgresql

Yes, you are right, but I was thinking of the ease of having a high-availability setup and pgbouncer was the first thing that came to mind.

And yes, I read the documentation and it is still cumbersome to have an HA setup that is easy to maintain. This is what I mean and hope it is clearer now.

Problem is primary devs of Postgres are consulting shops that make money in this space so making it seamless would affect their revenue
I work on PGD but I'm pretty sure we actively support open-source options for HA:

https://github.com/EnterpriseDB/repmgr

Repmgr is handy, but it is not a replication system per se, but rather makes managing a bunch of moving other parts easier.

Personally I'd like BDR to be in the main tree, or something else, equivalent to Galera:

* https://galeracluster.com

It's not perfect [1], but it'll get you a good way for many use cases.

[1] https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster

I'm hoping for seamless upgrades at some point, like MySQL does it — you install the next major version and restart the daemon. It's completely unnecessary for major operators, but is a life-saver for small businesses, websites that don't need 99.99999% of uptime (i.e. almost all of them), in development, etc.
How do updates work with postgres?
Well, three ways:

1. Dump a backup to disk, then restore from dump on new version;

2. Stop the old version, then run `pg_upgrade --link` on the data of the old version which should create a new data directory using hardlinks, then start the new version using the new data directory. This is rather quick; or

3. Use Logical Replication to stand up the new version. This has ... a few caveats.

You do a backup on the old version and a restore on the new one.

From what I understood, sometimes the way data is written to disk differently between versions and they're not compatible. I guess due to optimizations or changes in the storage engine?

Out of the box HA would make it complete. Apart from patroni, anyone experience with https://github.com/hapostgres/pg_auto_failover?
This is the exact reason why we went with MySQL. Many asked why, A small team with limited resources. You cannot go to production without HA, and many time cloud may not be a solution. For small use cases MySQL Innodb Cluster is a simple elegant solution to setup.
I agree, Postgres is great but I think it's just a little crusty for cloud environments. Clustering should be built in and easy to configure.
Isn't citus quite good at that? https://www.citusdata.com/
I think Citus only does sharding not replication. You can use it together with Patroni though.
What we really need is a de-facto open source tool to manage logical replication seamlessly. The devs have built all the pieces, and as of v17 there are very few limitations. But I am constantly tripping over the fact that I cannot observe how the replication is progressing, and a good way to mitigate failures.
Take a look at https://github.com/hapostgres/pg_auto_failover it's quite simple to use and manage. You can use libpq support for target_session_attrs (been there since Pg 10 - https://paquier.xyz/postgresql-2/postgres-10-libpq-read-writ... ) so you don't need active loadbalancer in front of your Pg cluster.

You can however create your own health/status check service on top of pg_autoctl show state and use HAProxy if required.

I don't think there's something easier to setup and manage than pg_auto_failover, Patroni always appeared very complicated to me.

I would say Pgpool-II is still the best we got.

This video covers pretty much all the practical stuff. I timestamped the parts in a comment.

PgPool II Performance and best practices https://www.youtube.com/watch?v=bMnVS0slgU0

Oh man, this takes me back to 2005 when I first moved to Pg and needed replication. Its replication is statement based, i.e. sends each DML query to all nodes. That means every write query must be deterministic. That wasn't always obvious. Though it did work for us until we moved to Slony then block level WAL replication. Thankfully we have logical replication today, even if it's still a bit more painful than MySQL.
> Slony

Gosh, haven't heard that in years. I remember a company I used to work for used it on their old system but the new system didn't use it for whatever reason.

Turning on replication in neon.tech is literally just a switch. And it plays well with a CDC library of mine:

https://github.com/cpursley/walex

Something pg native would be fantastic.
Patroni is pretty good at this.
Can patroni replicate across mixed versions of Postgres? (i.e. can it do logical replication? [and handle DDL replication?])

I thought patroni only did physical replication (only replicates across the same version of Postgres). But maybe I'm mistaken.