Hacker News new | ask | show | jobs
PostgreSQL 15 (postgresql.org)
655 points by jkatz05 1346 days ago
17 comments

This release includes a feature I added [1] to support partial foreign key updates in referential integrity triggers!

This is useful for schemas that use a denormalized tenant id across multiple tables, as might be common in a multi-tenant application:

    CREATE TABLE tenants (id serial PRIMARY KEY);
    CREATE TABLE users (
      tenant_id int REFERENCES tenants ON DELETE CASCADE,
      id serial,
      PRIMARY KEY (tenant_id, id),
    );
    CREATE TABLE posts (
        tenant_id int REFERENCES tenants ON DELETE CASCADE,
        id serial,
        author_id int,
        PRIMARY KEY (tenant_id, id),
        FOREIGN KEY (tenant_id, author_id)
          REFERENCES users ON DELETE SET NULL
    );
This schema has a problem. When you delete a user, it will try to set both the tenant_id and author_id columns on the posts table to NULL:

    INSERT INTO tenants VALUES (1);
    INSERT INTO users VALUES (1, 101);
    INSERT INTO posts VALUES (1, 201, 101);
    DELETE FROM users WHERE id = 101;
    ERROR:  null value in column "tenant_id" violates not-null constraint
    DETAIL:  Failing row contains (null, 201, null).
When we delete a user, we really only want to clear the author_id column in the posts table, and we want to leave the tenant_id column untouched. The feature I added is a small syntax extension to support doing exactly this. You can provide an explicit column list to the ON DELETE SET NULL / ON DELETE SET DEFAULT actions:

    CREATE TABLE posts (
        tenant_id int REFERENCES tenants ON DELETE CASCADE,
        id serial,
        author_id int,
        PRIMARY KEY (tenant_id, id),
        FOREIGN KEY (tenant_id, author_id)
          -- Clear only author_id, not tenant_id
          REFERENCES users ON DELETE SET NULL (author_id)
          --                                  ^^^^^^^^^^^
    );
I initially encountered this problem while converting a database to use composite primary keys in preparation for migrating to Citus [2], and it required adding custom triggers for every single foreign key we created. Now it can be handled entirely by Postgres!

[1]: https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV%3D...

[2]: https://www.citusdata.com/

Wait, did we also get trailing comma support in column lists?
Thanks for your work on that, very useful.
What happens if someone deletes a tenant?
The posts table also has a foreign key to the tenants table specified with ON DELETE CASCADE, so all the tenant's posts will be deleted. (I don't know if Postgres makes any effort to find some optimal ordering of the referenced tables—in this case deleting the tenant's posts firsts, then its users—to avoid updating records that will just get deleted anyway by cascading deletes.)
Nice, thanks!
SQL MERGE looks great! I hope I remember it when the time comes, instead of writing 3 separate queries.

edit: Postgres docs on MERGE: https://www.postgresql.org/docs/15/sql-merge.html

Hmmm. The doc kinda suggests that this might be more efficient than doing it with separate commands:

"First, the MERGE command performs a join from data_source to target_table_name producing zero or more candidate change rows. For each candidate change row, the status of MATCHED or NOT MATCHED is set just once, after which WHEN clauses are evaluated in the order specified. For each candidate change row, the first clause to evaluate as true is executed."

Anybody know more about this?

From lots of experience with SQL Server, I know that over there, MERGE is not more efficient, it's just syntactic sugar - and in fact it's buggy syntactic sugar, as there are some conditions where it doesn't handle concurrency properly.

Section 13.2. Transaction Isolation" has some additional information regarding the behavior of MERGE. Just CTRL-F and search for "MERGE" on https://www.postgresql.org/docs/15/transaction-iso.html#XACT...
I'm actually surprised to hear that MERGE is only now available on Postgres.

I'm now interesting in hearing about other standard (what I have come to expect as standard) SQL that's not or only now available on Postgres?

Well MS SQL Merge statement is not very good, and I personally avoid it, and most places I worked in recommend to avoid it, except in the simplest scenarios

From the docs "At scale, MERGE may introduce complicated concurrency issues or require advanced troubleshooting. As such, plan to thoroughly test any MERGE statement before deploying to production."

I dont know if its better in PQSQL , but they took their time, so maybe it is

> Well MS SQL Merge statement is not very good

That is rather an understatement. IIRC unless you are careful with transaction isolation levels and other options, a single MERGE statement can still deadlock against itself.

See https://michaeljswart.com/2021/08/what-to-avoid-if-you-want-... and the older page it links to, mongst many references to the problems you can encounter.

Though things are a fair amount better than they used to be, back in 2008 when MERGE was first introduced, with most of the persistent buggy behaviours pertaining to specific features that not everyone uses, I still completely avoid it.

Postgres’ history usually suggests they don’t ship broken database features which is why most of us reach for it as a first option when choosing a database. The MSSQL warning sounds bad enough that I’d never use this feature!
Well, the Postgres docs about MERGE include a similar warning: "When MERGE is run concurrently with other commands that modify the target table, the usual transaction isolation rules apply; see [Concurrency control / Transaction isolation] for an explanation on the behavior at each isolation level. You may also wish to consider using INSERT ... ON CONFLICT as an alternative statement which offers the ability to run an UPDATE if a concurrent INSERT occurs. There are a variety of differences and restrictions between the two statement types and they are not interchangeable." https://www.postgresql.org/docs/current/sql-merge.html
That's not a warning, instead, it's how it should work, and how one would want it to work, i.e. that the transaction isolation rules apply. Lack of this, would have warranted a warning.
Agreed. The amount of deadlocks merged causes in MS SQL is pretty insane.

We tried to use it for "upsert" type capabilities and even that would cause weird deadlocks.

Postgres already has the `Insert foo on conflict do update` type syntax which I think is generally better.

> Postgres already has the `Insert foo on conflict do update` type syntax which I think is generally better.

INSERT... ON CONFLICT is awesome, but it has some limitations.

The one I ran into most commonly is that it can only handle exactly one unique constraint on the target table. So if you have both a PK and another unique index, you need to choose which one gets the simple 'on conflict' and which one gets a hacky workaround (locks/transactions, triggers, exception handling, etc.)

If I'm reading the MERGE docs right, you can handle that case:

   WHEN MATCHED AND old.pkey = new.pkey THEN UPDATE SET value = new.value
   WHEN MATCHED AND old.col1 = new.col1 AND old.col2 = new.col2 THEN UPDATE SET reps = reps + 1
   WHEN NOT MATCHED THEN INSERT [...]
MERGE does not have locking built in in the way that ON CONFLICT does, so it handles more cases and fewer cases all at the same time.
Agreed, another vote here from painful experience to avoid MERGE in SQL Server like the plague.
Is that a consequence of MSSQL using row locks rather than MVCC by default? Or is it still the same even when you enable snapshot transactions?
It kind of did support it before. You could do an INSERT … ON CONFLICT ( keys here) DO UPDATE update query here
This only works if you reference a combination of columns that has been declared as unique which is not always something you want or can do.
Merge has a few more capabilities beyond that, but I think that's where you'd want to use it the most.
The reason is that PostgreSQL has INSERT ... ON CONFLICT which is usually what you want, especially since it handles concurrency in the way you usually want. MERGE has more capabilities but not enough of them to make such a complex feature prioritized.
At least for some cases, there was a workaround by using INSERT ... ON CONFLICT
I would argue that INSERT ... ON CONFLICT is not just a workaround but the correct solution in most cases. It is very explicit about what you want and makes sure that either it can take the correct locks or it will error out if there is no unique index/primary key that it can use to take the lock. But, yes, MERGE can do more things than INSERT ... ON CONFLICT.
I'm currently neck deep in a decent sized oracle to postgres project, and MERGE INTO saved me many many hours
> PostgreSQL 15 lets users create views that query data using the permissions of the caller, not the view creator. This option, called security_invoker, adds an additional layer of protection to ensure that view callers have the correct permissions for working with the underlying data.

Thank you, kind friends. This is a huge QOL improvement when using row-level security with views and is the top reason I'll be upgrading from Postgres 13 to 15.

I suspect this will make Supabase very happy! They really believe in row level security as a major line of defense so I imagine this makes it even better
This will be great for PostgREST too.
This should have always been the default behavior. Is it the default now?

Shocked it never was since it is a major source of data leaks (which completely defeats the purpose of using row-level security).

You're mixing up two orthogonal concepts. In the absence of row-level security, allowing a user read access to a view only (which may only have a filtered or condensed view of the underlying tables) is the more secure solution. If such a view would only work if the user also had access to the raw underlying tables, that would be a major source of data leaks.

In the presence of row-level security, the same concept allows you to create a system where (for example) the user can still access aggregate data for parts of the table where the individual rows are hidden to them. A simple example: a manager has full access to the salary records of his direct reports, but there's also a view that shows the wage balance per-team. If that view were to have invoke-as-user permissions, the manager would still only be able to view his own teams' wage balance. By making that view execute as security_definer instead, the admin can once again exercise full control over who gets access to which data.

So no, I don't think it should not be the default behaviour, and no, it does not completely defeat the purpose of row-level security.

What might you use it for? I love Postgres and am always looking for inspiration
I'm not doing this, but it would be very useful when using row level security in a multi-tenant application. You can create a single view for "all active orders" (or whatever, just an example) and querying that view from different users would now give you the correct (user limited) results. It sounds like previously this was not the case.
Exactly right. We can isolate customers from one another with policies on the table, so once the policy is in place, it's actually impossible for us to write code that exposes data from one customer to any others. But if you created a view on that table, querying the view would expose all the underlying data in the table, effectively removing the policy.

Previously the only way I found to get around this was to define a function with security_invoker, then create a view based on that function. But this change removes the need for this extra function, and you can create views that use row-level security directly.

Echo this. It's going to simplify a bunch of view definitions and policies.
I hate to ask a stupid question, but I'm new to administering a Postgres database.

Do admins usually upgrade their DBs with each major release? I'm guessing it's highly contextual and depends on how easy it is to do so, but I've heard about places that never upgrade until it's a huge problem for them to do so (in order to avoid an even worse problem.)

I don't unless I want to use something. The main consideration is hedging against an unusual problem in a new version vs. my software.

This may depend on your situation. For example, if you were running a large IT org with a few staff members with a lot of installed software, the cycle to process and check correct operation of each software under your management may be long enough that you'd start soon: by the time you are working on the last few systems under your responsibility, EOL for that Postgres version may be in sight.

You may also be working on a very new software project, where bugs are of lower consequence and you'd like to forestall the friction of an upgrade motivated by EOL. Or, you may find yourself in a lull in useful development and this is a good way to get ahead of maintenance, effectively reducing pressure in the future. In this case, I'd also consider upgrading...after a few months. Even Postgres has somewhat iffy .0 releases.

My own experience is in vertically integrated teams where we control all the software, so we typically upgrade every second or third version, leaving over a year to resolve issues before EOL of the version we're on.

I don't think I've worked in an environment where we've upgraded for every release. New projects may start on that newer version, but generally speaking for older projects there's a cost calculation done for the newer features versus the lift required to do the upgrade and ensure it doesn't introduce any regressions.

Someone already mentioned the view permissions shift here which looks at caller permissions versus the view creator permissions, that will be compelling in a lot of cases so like this is something that I'd probably raise internally for my team for a few of the apps we maintain and then have a back and forth with the principals, if we like it then go through our current list of work with a PM and our manager and see if it makes sense to do now with the current pipeline of work etc.

We are heavy on-prem Postgres users. We currently run all major versions from 10-13, and are upgrading some clusters to 14 very soon. Old versions are mostly to support internal tools that we're not upgrading for various reasons, and will go away when the apps that use them do.

We generally stay one major version behind on the main production clusters. The major reason is usually just scheduling - PG upgrades are large projects that touch a lot of things, take a lot of prep and impact every part of our business, so scheduling it is always a fraught negotiation.

If you don't have a lot of complexity, I recommend finding a cadence that doesn't hurt, but sticking with it. Once you get far enough behind, it becomes harder to upgrade, mostly because you also weren't upgrading surrounding tooling, so you end up changing a lot of things at once.

As your environment grows in complexity and use, at some point I think you'll find that nontechnical concerns start to dominate these decisions.

Where I work we've recently upgraded many PostgreSQL servers from 9.x/10.x/11.x to 14.x.

A very cool PostgreSQL feature is that, if you install a second instance on the same server, it will by default use a different path and port to run the second instance.

  root@example:~# pg_lsclusters
  Ver Cluster Port Status Owner    Data directory              Log file
  11  main    5432 online postgres /var/lib/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
  14  main    5433 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
Now you can dump and import databases on the second instance and test your applications with the new version. If it all works as expected, use pg_upgrade[1] to migrate everything to the new instance and switch the ports after the migration.

I have only done this migration process on Debian systems but in my experience it worked fantastic!

[1] https://www.postgresql.org/docs/current/pgupgrade.html

No, but as a PostgreSQL fan I upgrade the databases every major release for my side projects. I always wait until .1 or .2 is released though.
If only they made the upgrade automatic.

I use it in a container mainly, and MariaDB has an option to enable auto upgrades, but postgres requires you to manually upgrade every time.

For small databases with lax uptime requirements I find it pretty easy to upgrade. Just stop the database, run pg_upgrade_cluster (a Debian tool) and create a new replica.

For large databases where you care about uptime upgrading is usually a huge project anyway, since you have to make sure that nothing breaks due to the new version (check query plans, check if the new version changed anything, etc). You also probably want to use logical replication for the upgrade. There are things that PostgreSQL can improve of course.

As an aside I recently was involved in an upgrade from 9.6 where we used logical replication and pg_dump (yes, we really used pg_dump on a 20+ TB database). The reason for using pg_dump was because we have some tables which bloated very fast meaning we could not hold a snapshot open on the master long enough to do the initial data copy so the initial copy was done with pg_dump from a paused replica. And to not lose any data during pg_dump/pg_restore we kept a logical replication slot alive (but unused until pg_restore was complete). That was an interesting project, especially since Googling did not turn up anyone who had done the same.

I'd genuinely love to read about that if you're able to share more?
Isn’t keeping a logical slot alive holding a snapshot?
Not normally, no. A real snapshot is held during the initial creation of the logical slot. After that the slot's "catalog xmin" prevents old catalog (but not user table!) row versions from being removed, but that's somewhat different from a snapshot.
Major version upgrades by definition contain breaking changes. Automatic upgrades across major versions will break things. Still, some distributions do handle this somewhat sanely. For example, Debian and derivatives such as Ubuntu let you have multiple versions installed at the same time and provide a pg_upgradecluster tool to handle the upgrade automatically in most cases. It just won't do it while you are asleep (thankfully).
I've heard a saying "just say no to dot oh" -- encouraging one to skip the .0 release and wait until .1 or .2 before doing the upgrade.
This will stop working when not enough people adopt .0 versions anymore, because errors from regular usage will not pop up. .0 will become the new beta release that nobody cares about.
I typically update to .0 on my dev machine ASAP, and if there are no problems after a bit I'll update the server to .0 (or usually .1, since that'll usually be out by then).

I think that's a nice compromise; although not a "real" test as the load is a lot less. Also things may work in dev but not production, but unless you explicitly use new features that's rare.

For larger setups you can us canary servers and whatnot, but I prefer to work at small companies and keep my sanity.

For context my ideal setup has at least 3 environments - prod, qa/testing before release, and dev. With dev being the most likely to break, and the qa/testing for when we need to be absolutely certain that differences between prod and dev aren't masking any bugs.

I do like the idea of "canary versioning" some stuff. It's a bit of a game theory issue isn't it, since we all need prod to work at all times and can't risk a .0, and a major version difference between dev and prod can mask some nasty issues in my experience.

Luckily nowdays we have enough brave persons who just take whatever is postgres:latest image from Docker hub.

Unluckily, chances they will report bugs not that high - at least I don't expect bugs related to any moderate size/loaded databases be found by this way.

As someone who maintains Postgres 9.2, 9.4, and 9.6 databases. No, they do not.

(It is not my will that these databases aren’t on newer versions, I would very much like them to be)

Just curious, is the issue a time thing, or is it on a system that can't easily upgrade it - or is it a breaking changes thing that would require refactors?

Basically I'm just curious what broke for you

One main challenge is to upgrade a live large database smoothly, without downtime.

I believe minor version upgrades are easily enough since they maintain disk format compatibility and you can simply spin up a new replica server under the new postgres version and then when it's synched with the main database you can make the switch and kill the old version. I'm not so sure about migrating to a new major version (ex: 9.x to 15.x).

Thing is, in this specific context, as I understand it, this is much easier to do post 9.x.

Notably, Postgres has more formal support for logical replication. Logical replication is distinct from the normal replication in that it forwards actual SQL to the replica rather than binary write ahead logs.

The SQL is portable across versions more than WALs are. So with logical replication you can copy a live db to a new version. Then you just need some (ideally) brief down time to swap out the servers.

There are solutions for 9.x, but it got much better later on.

My experience is mostly that you either upgrade because it has a feature you need (then you try it out fast), if it has some performance or other nice thing (then you go for the .1) or you don't really care then you upgrade when your current version goes EOL or if you upgrade your underlying server OS (if we assume something like Ubuntu LTS, and they'd go with 14 instead of 12 or something).
When a new version reaches RDS we start planning for the upgrade. We like the new features so we try to stay as current as we can.
Our current policy is to update to the N-1 release. So we're currently running v13 since v14 was the latest but will soon upgrade to v14 since v15 is now the latest.
We try to upgrade at least once a year, during our slow season (ed-tech, so summer time is < 1/10th peak traffic).

We use AWS Aurora with Postgres and we autoscale our cluster. Because most upgrades have a few perf improvements we have found that we save a bit of money every time we upgrade, either in reduced CPU (which translates to running fewer instances) or in better memory efficiency (which boosts buffer_cache_hit_ratio and therefore reduces IO).

It takes us a few months to upgrade, with planning, testing, co-ordinating downtime, etc. We upgrade every other year, to keep us from falling too far back.

Very excited for this year, as we'll be using logical replication to migrate, which should make this much less risky. (since the old systems are untouched).

In the past, we used pg_upgrade for in place upgrades, and even though we took snapshots before, still seemed a bit stressfull when it happened.

The version support is five years: https://www.postgresql.org/support/versioning/

As long as you stay well ahead of that, you are fine.

However, I'd strongly recommend testing your applications with new versions as they come out, or in beta (even better). You may learn about new features, find bugs in your application, find bugs in your migration processes, etc. If you bring any problems to the community right away, then by the time you actually want to upgrade, the problems may already be solved in the newer release.

According to https://www.postgresql.org/support/versioning/ the "Final Release" (whatever that means) for v14 is November 2026, and for v15 is November 2024. I wonder if it's a typo or some LTS thing.
Almost certainly a typo. PG releases get 5 years of support.
I manage two separate projects at my work, I would never upgrade to a .0 update, but we do every update it either when there's features we want, over every ~2~3 years for the general improvements.

We use django and some other stuff that has very decent support for postgres, and in my experience postgres updates rarely have breaking changes that require much if any changes for us. This also keeps us from being "trapped" on an old version.

I reiterate from all my systems management - do not update to a .0 release of anything unless there is a critical security update and you're forced to.

EDIT: Oh directly for the question - eh depends. Every 2 to 3 max, and usually after it's tested by others more.

And new projects generally use whatever is newest unless it's a .0 release.

Note that vendors know this and some release few patch versions quickly with only few patches to make people move to newest version (and get the community test it by using it). No idea if Postgres is one of these.
Postgres has a strict schedule of quarterly bugfix releases, and only deviates from that when serious bugs occur (very rare). So, no.
Depends on the app but usually don't major version upgrade unless the current version is losing support or there's a compelling new feature. I wouldn't run on the latest until a few point releases are out (say .3 or .4).

I haven't worked on anything recently with a good performance testing system so that weighs in a bit, too (there's not a high degree of confidence the app won't hit a weird perf regression or edge case)

If it's using a managed service, that weighs in, too. Something like AWS RDS Postgres usually has a 6 month lag + give it some additional time to work out any AWS control plane "surprises"

Corruption of database storage is the biggest fear so most upgrades are done cautiously. Historically postgres versions were released much much slower so the upgrade cadence was slower too. That said, we've been upgrading more frequently, especially since a lot of performance features are being released. So far our upgrades have been executed flawlessly for our cluster of postgres servers (~30 postgres chains each made up of 4x large dedicated servers).
Postgres has released a new major version each year for a long time.

The only change is that versions used to be X.Y.Z, and now it just used X.Y. Now, X gets incremented for every major version, so it seems like it's moving faster.

The difference is that in place upgrades aren't possible for major versions.
pg_upgrade is the way to do in-place upgrade between major versions:

https://www.postgresql.org/docs/current/pgupgrade.html

Jeff is right, PostgreSQL has released a new major yearly the last almost 25 years.
That would only be possible if they started with version -10!
Before version 10, the versioning scheme was different, and the first number was increased about once every five years: for example, 9.6 and 9.5 were consecutive yearly major releases. Version 7.0 was released around year 2000.
This probably depends on your database, but with Postgres we usually don't stay on the old version for too long as the updates are usually very smooth and painless.
How do you upgrade and how much data do you have ?
Mostly in the 1TB range and usually by doing the replication / promotion dance. I know it gets more complicated if you have larger ones or with zero write downtime but in our case that's usually acceptable.
We do. Though we make sure to stay on the previous major. So we’ll only upgrade to PostgreSQL 15 when 16 is released.
In my experience, it’s always been a huge decision to upgrade based on lots of analysis and factors. I have never worked in a company where the team routinely upgrades when a major version comes out, as it generally doesn’t make business sense. Just my 2c…
.1+ releases might be best with pg for quality assurance
MERGE feature is interesting. But specifically on the revoking CREATE permissions for the public (or default) schema, this is a step in the right direction. Some of the defaults in Postgres can be more secure. For example, the first time I use a POSTGRES_PASSWORD to configure a password, changing this password involves more steps than just changing the values of the ENV, because it doesn't take the changed value there after.

Structured logging with JSON is going to improve a lot of debugging, again a great productive change.

Also, any idea when the docker image for Postgres 15 will be available?

Looking at the Alpine docker file(s) for Postgres you might be able to use the one for the release candidate and set en environment variable of PGVERSION=15.0 which should use https://ftp.postgresql.org/pub/source/v15.0/ here.

You would need to figure out what the package name is on Debian (if it even exists yet?) it's currently set to ENV PG_VERSION 15~rc2-1.pgdg110+1

YMMV.

What's the status of zheap?

https://wiki.postgresql.org/wiki/Zheap seems to claim it has been rebased on top of 14.1, but generally progress seems slow?

Also https://cybertec-postgresql.github.io/zheap/

PostGIS 3.3.0 mentioned another improvement in this release: "This version of PostGIS can utilize the faster GiST building support API introduced in PostgreSQL 15."

https://postgis.net/2022/08/27/postgis-3.3.0-released/

As an aside, I've been trying to learn basic GIS with PostGIS and QGIS and it's been quite frustrating. I had a dataset of roads which were broken up into short segments, which I wanted to merge back together based on a key. Theoretically that's a single simple operation, in practice it was getting hung up on something I couldn't understand and it took all afternoon. My usual practice of JIT doc reading wasn't working well, too many unfamiliar terms and missing fundamentals. If you have any recommendations for books or docs I'd love to hear them.
"PostGIS in Action" is a good option. Also check out https://locatepress.com/ which has a few more books in this niche.

> a dataset of roads which were broken up into short segments, which I wanted to merge back together based on a key. Theoretically that's a single simple operation

PostGIS provides the ST_MakeLine aggregate function for this, but you need to write the query such that the GROUP BY query retains the correct order. Creating a new line segment out of many line segments effectively means breaking the lines into their constituent points and then creating a new linestring based on the points. For things like GPS data, you can order by timestamp. But for other cases? You've got to write your aggregate query carefully so that adjacent line segments are actually meant to be merged.

Just some feedback for releases of any software: I think apt sources and repositories should be ready to go on launch and PR-release so people can immediately use the new version. Looks like that's going to take 2-3 days. Sources are available but that's not something most people want to delve in with make files and dependencies. Something like postgres is huge. Right now, if you go to downloads and expect postgresql-15 available, it is not; lot of people on IRC and elsewhere on Twitter are confused where to download postgresql-15. I know that takes time, so the PR release should just be delayed until apt sources are ready. May be also docker repositories.
> Just some feedback for releases of any software: I think apt sources and repositories should be ready to go on launch and PR-release so people can immediately use the new version.

Normally that's the case - we "wrap" the release on Monday so that packagers have time till Thursday to get packages ready. Looks like something didn't quite work out this time. Looking into what went wrong.

Part of it is that a list of supported versions on the windows, macos download pages weren't updated, despite the 15 being available. But unfortunately the Debian / Ubuntu packages are indeed not yet ready.

> May be also docker repositories

postgresql.org doesn't currently provide docker containers to my knowledge.

No worries, thanks for looking into it.
Is there anything like Galera for PostreSQL? I find it very convenient for small-scale HA and redundancy and it's quite easy to get going.
Not for free (have to pay EnterpriseDB for that). Every free option here is basically "glue pieces together to build your own HA".
I am not sure exactly what Galera does, but you may want to look into Citus (https://www.citusdata.com/).
Citus is great, however do note that reading the docs you do have to change your scheme to use Citus. So it's not really "drop-in" par say.
I don't know much about Galera, but Patroni may be of interest to you
Galera is multi-master but I'm not sure that's as important with Postgres (it has good baseline performance & can fail over quickly).

Patroni is great for managing active-hot standby clusters

I'm a little bit confused on the "sorting perf improvements" bit. Does that mean that if I have a query with a `SORT BY`, it will literally "just be faster"? Surely that sounds too good to be true...?
Basically, yes, it will literally just be faster.

https://techcommunity.microsoft.com/t5/azure-database-for-po...

On a sidenote it's amazing to see how much MS is contributing to open source.
To give credit where credit is due this was not just a Microsoft contribution. The four mentioned contributors were all from different companies: Microsoft, Dalibo, Greenplum and EnterpriseDB. Microsoft employs some of the core contributors of PostgreSQL, but many patches come out of cross company collaboration.
Their military contributions are also awe inducing
:O

Thanks for the link!

Could someone give some examples on their own domains where the MERGE command is a huge QOL improvement over what's currently available?

I see a lot of people being so very happy in the comments, and, well, I've tried to think long and hard about how to apply it to my current domain but was a bit at a loss... Maybe some practical examples can help?

The operation it's replacing is something like "SELECT, followed by UPDATE/INSERT".

Implicit in that sequence is transmitting the selected rows over the network, and buffering the rows in-memory on the client side.

With MERGE, you eliminate the network stress, and push the burden of managing the rows in-memory onto the postgres server.

That's quite nice if you have beefy operations and want to keep the services/jobs running those operations lean.

You can already do SELECT followed by UPDATE/INSERT in a single query in postgres using CTEs...
You can do them individually yes, but you can’t do INSERT and UPDATE from the same SELECT CTE.

Before, you’d have to either load the data in the client side or duplicate the CTE across two statements in a transaction.

You can, like this:

    WITH results as (
      SELECT ...
    ),

    inserted_rows AS (
      INSERT ...
      SELECT * FROM results
      ...
    ),

    deleted_rows AS (
      DELETE ...
      USING results
      ... 
    )

    UPDATE ...
    FROM results
Woah! I stand corrected, thanks.

It seems you’re knowledgeable on this, do you know if MERGE useful beyond expressivity?

Works well for bulk operations where you're loading data in on a lower frequency.
it's common to use it to load Slowly Changing Dimensions in data warehouses, at least in other systems, so it's nice to have the same-ish syntax in PG
Glad to see all the new regex functions. I recently moved a database from AWS Redshift to Postgres on Heroku and was shocked to see how many functions like regexp_substr() weren't available. Wish this had come sooner so I didn't have to rewrite so many of my queries.
That’s an interesting change from a analytical to transactional postgres flavour. What workload is on this DB?
The app is so small that we didn't need a separate database copy just for reporting.
jsonlog looks pretty neat! Structured logging is going to make a lot of tooling much easier to write
> Queries using SELECT DISTINCT can now be executed in parallel.

This sounds quite interesting, but I would assume it does not always work? I didn't see this mentioned in the linked documentation, does someone know when/how the parallel distinct works?

Couldn't tell you the when, but I can tell you the how is likely how you'd expect.

Generally speaking, to do distinct you need a dictionary to look up previously seen values. To do it in parallel you need to make that dictionary thread safe.

For Java, such a thread safe dictionary is made by segmenting the table and synchronizing on the segments. So you'd hash your values, figure out which segment that targets, lock that segment, and then read/update that segment to contain the new value.

I'd assume that postgres is doing a fairly similar trick, The only additional synchronization would be on a linked list of found values. In that case, you could either lock the list and update as new values come in, you could sort those values after the fact, or you could employ a lock free algorithm to add nodes to the list (see lock free queue implementations).

Most parallel operations in PG are implemented by simple merge the dataset, work independently and merge the results. I expect the new distinct to behave the same and not work on a shared data structure.
Thanks! May be helpful to include this in the documentation, since I guess it will then often depend on the numDistinctRows estimate [1] if the parallel plan is used.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f...

"select distinct" is likely now syntax sugar around "select ... group by 1", which worked in parallel for a while.
Congrats PG team and community
Interestingly, there are no breaking changes that were required to be addressed by Hasura GraphQL Engine to support Postgres 15. Hasura is fully compatible with this release, with the potential of adding the MERGE command via the GraphQL API soon.

Excited about the incremental performance improvements and making more secure defaults by revoking CREATE permission for public schema for non-superusers.

I hope that someday it will become as popular as MySql. Although I see complicated, since many companies use other alternatives and in my experience it is complicated to make the migration when you have many years using the same technologies.