This release includes a feature I added [1] to support partial foreign key updates in referential integrity triggers!
This is useful for schemas that use a denormalized tenant id across multiple tables, as might be common in a multi-tenant application:
CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id)
REFERENCES users ON DELETE SET NULL
);
This schema has a problem. When you delete a user, it will try to set both the tenant_id and author_id columns on the posts table to NULL:
INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR: null value in column "tenant_id" violates not-null constraint
DETAIL: Failing row contains (null, 201, null).
When we delete a user, we really only want to clear the author_id column in the posts table, and we want to leave the tenant_id column untouched. The feature I added is a small syntax extension to support doing exactly this. You can provide an explicit column list to the ON DELETE SET NULL / ON DELETE SET DEFAULT actions:
CREATE TABLE posts (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id)
-- Clear only author_id, not tenant_id
REFERENCES users ON DELETE SET NULL (author_id)
-- ^^^^^^^^^^^
);
I initially encountered this problem while converting a database to use composite primary keys in preparation for migrating to Citus [2], and it required adding custom triggers for every single foreign key we created. Now it can be handled entirely by Postgres!
The posts table also has a foreign key to the tenants table specified with ON DELETE CASCADE, so all the tenant's posts will be deleted. (I don't know if Postgres makes any effort to find some optimal ordering of the referenced tables—in this case deleting the tenant's posts firsts, then its users—to avoid updating records that will just get deleted anyway by cascading deletes.)
Hmmm. The doc kinda suggests that this might be more efficient than doing it with separate commands:
"First, the MERGE command performs a join from data_source to target_table_name producing zero or more candidate change rows. For each candidate change row, the status of MATCHED or NOT MATCHED is set just once, after which WHEN clauses are evaluated in the order specified. For each candidate change row, the first clause to evaluate as true is executed."
Anybody know more about this?
From lots of experience with SQL Server, I know that over there, MERGE is not more efficient, it's just syntactic sugar - and in fact it's buggy syntactic sugar, as there are some conditions where it doesn't handle concurrency properly.
Well MS SQL Merge statement is not very good, and I personally avoid it, and most places I worked in recommend to avoid it, except in the simplest scenarios
From the docs
"At scale, MERGE may introduce complicated concurrency issues or require advanced troubleshooting. As such, plan to thoroughly test any MERGE statement before deploying to production."
I dont know if its better in PQSQL , but they took their time, so maybe it is
That is rather an understatement. IIRC unless you are careful with transaction isolation levels and other options, a single MERGE statement can still deadlock against itself.
Though things are a fair amount better than they used to be, back in 2008 when MERGE was first introduced, with most of the persistent buggy behaviours pertaining to specific features that not everyone uses, I still completely avoid it.
Postgres’ history usually suggests they don’t ship broken database features which is why most of us reach for it as a first option when choosing a database. The MSSQL warning sounds bad enough that I’d never use this feature!
Well, the Postgres docs about MERGE include a similar warning: "When MERGE is run concurrently with other commands that modify the target table, the usual transaction isolation rules apply; see [Concurrency control / Transaction isolation] for an explanation on the behavior at each isolation level. You may also wish to consider using INSERT ... ON CONFLICT as an alternative statement which offers the ability to run an UPDATE if a concurrent INSERT occurs. There are a variety of differences and restrictions between the two statement types and they are not interchangeable." https://www.postgresql.org/docs/current/sql-merge.html
That's not a warning, instead, it's how it should work, and how one would want it to work, i.e. that the transaction isolation rules apply. Lack of this, would have warranted a warning.
> Postgres already has the `Insert foo on conflict do update` type syntax which I think is generally better.
INSERT... ON CONFLICT is awesome, but it has some limitations.
The one I ran into most commonly is that it can only handle exactly one unique constraint on the target table. So if you have both a PK and another unique index, you need to choose which one gets the simple 'on conflict' and which one gets a hacky workaround (locks/transactions, triggers, exception handling, etc.)
If I'm reading the MERGE docs right, you can handle that case:
WHEN MATCHED AND old.pkey = new.pkey THEN UPDATE SET value = new.value
WHEN MATCHED AND old.col1 = new.col1 AND old.col2 = new.col2 THEN UPDATE SET reps = reps + 1
WHEN NOT MATCHED THEN INSERT [...]
The reason is that PostgreSQL has INSERT ... ON CONFLICT which is usually what you want, especially since it handles concurrency in the way you usually want. MERGE has more capabilities but not enough of them to make such a complex feature prioritized.
I would argue that INSERT ... ON CONFLICT is not just a workaround but the correct solution in most cases. It is very explicit about what you want and makes sure that either it can take the correct locks or it will error out if there is no unique index/primary key that it can use to take the lock. But, yes, MERGE can do more things than INSERT ... ON CONFLICT.
> PostgreSQL 15 lets users create views that query data using the permissions of the caller, not the view creator. This option, called security_invoker, adds an additional layer of protection to ensure that view callers have the correct permissions for working with the underlying data.
Thank you, kind friends. This is a huge QOL improvement when using row-level security with views and is the top reason I'll be upgrading from Postgres 13 to 15.
I suspect this will make Supabase very happy! They really believe in row level security as a major line of defense so I imagine this makes it even better
You're mixing up two orthogonal concepts. In the absence of row-level security, allowing a user read access to a view only (which may only have a filtered or condensed view of the underlying tables) is the more secure solution. If such a view would only work if the user also had access to the raw underlying tables, that would be a major source of data leaks.
In the presence of row-level security, the same concept allows you to create a system where (for example) the user can still access aggregate data for parts of the table where the individual rows are hidden to them. A simple example: a manager has full access to the salary records of his direct reports, but there's also a view that shows the wage balance per-team. If that view were to have invoke-as-user permissions, the manager would still only be able to view his own teams' wage balance. By making that view execute as security_definer instead, the admin can once again exercise full control over who gets access to which data.
So no, I don't think it should not be the default behaviour, and no, it does not completely defeat the purpose of row-level security.
I'm not doing this, but it would be very useful when using row level security in a multi-tenant application. You can create a single view for "all active orders" (or whatever, just an example) and querying that view from different users would now give you the correct (user limited) results. It sounds like previously this was not the case.
Exactly right. We can isolate customers from one another with policies on the table, so once the policy is in place, it's actually impossible for us to write code that exposes data from one customer to any others. But if you created a view on that table, querying the view would expose all the underlying data in the table, effectively removing the policy.
Previously the only way I found to get around this was to define a function with security_invoker, then create a view based on that function. But this change removes the need for this extra function, and you can create views that use row-level security directly.
I hate to ask a stupid question, but I'm new to administering a Postgres database.
Do admins usually upgrade their DBs with each major release? I'm guessing it's highly contextual and depends on how easy it is to do so, but I've heard about places that never upgrade until it's a huge problem for them to do so (in order to avoid an even worse problem.)
I don't unless I want to use something. The main consideration is hedging against an unusual problem in a new version vs. my software.
This may depend on your situation. For example, if you were running a large IT org with a few staff members with a lot of installed software, the cycle to process and check correct operation of each software under your management may be long enough that you'd start soon: by the time you are working on the last few systems under your responsibility, EOL for that Postgres version may be in sight.
You may also be working on a very new software project, where bugs are of lower consequence and you'd like to forestall the friction of an upgrade motivated by EOL. Or, you may find yourself in a lull in useful development and this is a good way to get ahead of maintenance, effectively reducing pressure in the future. In this case, I'd also consider upgrading...after a few months. Even Postgres has somewhat iffy .0 releases.
My own experience is in vertically integrated teams where we control all the software, so we typically upgrade every second or third version, leaving over a year to resolve issues before EOL of the version we're on.
I don't think I've worked in an environment where we've upgraded for every release. New projects may start on that newer version, but generally speaking for older projects there's a cost calculation done for the newer features versus the lift required to do the upgrade and ensure it doesn't introduce any regressions.
Someone already mentioned the view permissions shift here which looks at caller permissions versus the view creator permissions, that will be compelling in a lot of cases so like this is something that I'd probably raise internally for my team for a few of the apps we maintain and then have a back and forth with the principals, if we like it then go through our current list of work with a PM and our manager and see if it makes sense to do now with the current pipeline of work etc.
We are heavy on-prem Postgres users. We currently run all major versions from 10-13, and are upgrading some clusters to 14 very soon. Old versions are mostly to support internal tools that we're not upgrading for various reasons, and will go away when the apps that use them do.
We generally stay one major version behind on the main production clusters. The major reason is usually just scheduling - PG upgrades are large projects that touch a lot of things, take a lot of prep and impact every part of our business, so scheduling it is always a fraught negotiation.
If you don't have a lot of complexity, I recommend finding a cadence that doesn't hurt, but sticking with it. Once you get far enough behind, it becomes harder to upgrade, mostly because you also weren't upgrading surrounding tooling, so you end up changing a lot of things at once.
As your environment grows in complexity and use, at some point I think you'll find that nontechnical concerns start to dominate these decisions.
Where I work we've recently upgraded many PostgreSQL servers from 9.x/10.x/11.x to 14.x.
A very cool PostgreSQL feature is that, if you install a second instance on the same server, it will by default use a different path and port to run the second instance.
root@example:~# pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
11 main 5432 online postgres /var/lib/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
14 main 5433 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
Now you can dump and import databases on the second instance and test your applications with the new version. If it all works as expected, use pg_upgrade[1] to migrate everything to the new instance and switch the ports after the migration.
I have only done this migration process on Debian systems but in my experience it worked fantastic!
For small databases with lax uptime requirements I find it pretty easy to upgrade. Just stop the database, run pg_upgrade_cluster (a Debian tool) and create a new replica.
For large databases where you care about uptime upgrading is usually a huge project anyway, since you have to make sure that nothing breaks due to the new version (check query plans, check if the new version changed anything, etc). You also probably want to use logical replication for the upgrade. There are things that PostgreSQL can improve of course.
As an aside I recently was involved in an upgrade from 9.6 where we used logical replication and pg_dump (yes, we really used pg_dump on a 20+ TB database). The reason for using pg_dump was because we have some tables which bloated very fast meaning we could not hold a snapshot open on the master long enough to do the initial data copy so the initial copy was done with pg_dump from a paused replica. And to not lose any data during pg_dump/pg_restore we kept a logical replication slot alive (but unused until pg_restore was complete). That was an interesting project, especially since Googling did not turn up anyone who had done the same.
Not normally, no. A real snapshot is held during the initial creation of the logical slot. After that the slot's "catalog xmin" prevents old catalog (but not user table!) row versions from being removed, but that's somewhat different from a snapshot.
Major version upgrades by definition contain breaking changes. Automatic upgrades across major versions will break things. Still, some distributions do handle this somewhat sanely. For example, Debian and derivatives such as Ubuntu let you have multiple versions installed at the same time and provide a pg_upgradecluster tool to handle the upgrade automatically in most cases. It just won't do it while you are asleep (thankfully).
This will stop working when not enough people adopt .0 versions anymore, because errors from regular usage will not pop up. .0 will become the new beta release that nobody cares about.
I typically update to .0 on my dev machine ASAP, and if there are no problems after a bit I'll update the server to .0 (or usually .1, since that'll usually be out by then).
I think that's a nice compromise; although not a "real" test as the load is a lot less. Also things may work in dev but not production, but unless you explicitly use new features that's rare.
For larger setups you can us canary servers and whatnot, but I prefer to work at small companies and keep my sanity.
For context my ideal setup has at least 3 environments - prod, qa/testing before release, and dev. With dev being the most likely to break, and the qa/testing for when we need to be absolutely certain that differences between prod and dev aren't masking any bugs.
I do like the idea of "canary versioning" some stuff. It's a bit of a game theory issue isn't it, since we all need prod to work at all times and can't risk a .0, and a major version difference between dev and prod can mask some nasty issues in my experience.
Luckily nowdays we have enough brave persons who just take whatever is postgres:latest image from Docker hub.
Unluckily, chances they will report bugs not that high - at least I don't expect bugs related to any moderate size/loaded databases be found by this way.
Just curious, is the issue a time thing, or is it on a system that can't easily upgrade it - or is it a breaking changes thing that would require refactors?
One main challenge is to upgrade a live large database smoothly, without downtime.
I believe minor version upgrades are easily enough since they maintain disk format compatibility and you can simply spin up a new replica server under the new postgres version and then when it's synched with the main database you can make the switch and kill the old version. I'm not so sure about migrating to a new major version (ex: 9.x to 15.x).
Thing is, in this specific context, as I understand it, this is much easier to do post 9.x.
Notably, Postgres has more formal support for logical replication. Logical replication is distinct from the normal replication in that it forwards actual SQL to the replica rather than binary write ahead logs.
The SQL is portable across versions more than WALs are. So with logical replication you can copy a live db to a new version. Then you just need some (ideally) brief down time to swap out the servers.
There are solutions for 9.x, but it got much better later on.
My experience is mostly that you either upgrade because it has a feature you need (then you try it out fast), if it has some performance or other nice thing (then you go for the .1) or you don't really care then you upgrade when your current version goes EOL or if you upgrade your underlying server OS (if we assume something like Ubuntu LTS, and they'd go with 14 instead of 12 or something).
Our current policy is to update to the N-1 release. So we're currently running v13 since v14 was the latest but will soon upgrade to v14 since v15 is now the latest.
We try to upgrade at least once a year, during our slow season (ed-tech, so summer time is < 1/10th peak traffic).
We use AWS Aurora with Postgres and we autoscale our cluster. Because most upgrades have a few perf improvements we have found that we save a bit of money every time we upgrade, either in reduced CPU (which translates to running fewer instances) or in better memory efficiency (which boosts buffer_cache_hit_ratio and therefore reduces IO).
It takes us a few months to upgrade, with planning, testing, co-ordinating downtime, etc. We upgrade every other year, to keep us from falling too far back.
Very excited for this year, as we'll be using logical replication to migrate, which should make this much less risky. (since the old systems are untouched).
In the past, we used pg_upgrade for in place upgrades, and even though we took snapshots before, still seemed a bit stressfull when it happened.
As long as you stay well ahead of that, you are fine.
However, I'd strongly recommend testing your applications with new versions as they come out, or in beta (even better). You may learn about new features, find bugs in your application, find bugs in your migration processes, etc. If you bring any problems to the community right away, then by the time you actually want to upgrade, the problems may already be solved in the newer release.
According to https://www.postgresql.org/support/versioning/ the "Final Release" (whatever that means) for v14 is November 2026, and for v15 is November 2024. I wonder if it's a typo or some LTS thing.
I manage two separate projects at my work, I would never upgrade to a .0 update, but we do every update it either when there's features we want, over every ~2~3 years for the general improvements.
We use django and some other stuff that has very decent support for postgres, and in my experience postgres updates rarely have breaking changes that require much if any changes for us. This also keeps us from being "trapped" on an old version.
I reiterate from all my systems management - do not update to a .0 release of anything unless there is a critical security update and you're forced to.
EDIT: Oh directly for the question - eh depends. Every 2 to 3 max, and usually after it's tested by others more.
And new projects generally use whatever is newest unless it's a .0 release.
Note that vendors know this and some release few patch versions quickly with only few patches to make people move to newest version (and get the community test it by using it). No idea if Postgres is one of these.
Depends on the app but usually don't major version upgrade unless the current version is losing support or there's a compelling new feature. I wouldn't run on the latest until a few point releases are out (say .3 or .4).
I haven't worked on anything recently with a good performance testing system so that weighs in a bit, too (there's not a high degree of confidence the app won't hit a weird perf regression or edge case)
If it's using a managed service, that weighs in, too. Something like AWS RDS Postgres usually has a 6 month lag + give it some additional time to work out any AWS control plane "surprises"
Corruption of database storage is the biggest fear so most upgrades are done cautiously. Historically postgres versions were released much much slower so the upgrade cadence was slower too. That said, we've been upgrading more frequently, especially since a lot of performance features are being released. So far our upgrades have been executed flawlessly for our cluster of postgres servers (~30 postgres chains each made up of 4x large dedicated servers).
Postgres has released a new major version each year for a long time.
The only change is that versions used to be X.Y.Z, and now it just used X.Y. Now, X gets incremented for every major version, so it seems like it's moving faster.
Before version 10, the versioning scheme was different, and the first number was increased about once every five years: for example, 9.6 and 9.5 were consecutive yearly major releases. Version 7.0 was released around year 2000.
This probably depends on your database, but with Postgres we usually don't stay on the old version for too long as the updates are usually very smooth and painless.
Mostly in the 1TB range and usually by doing the replication / promotion dance. I know it gets more complicated if you have larger ones or with zero write downtime but in our case that's usually acceptable.
In my experience, it’s always been a huge decision to upgrade based on lots of analysis and factors. I have never worked in a company where the team routinely upgrades when a major version comes out, as it generally doesn’t make business sense. Just my 2c…
MERGE feature is interesting. But specifically on the revoking CREATE permissions for the public (or default) schema, this is a step in the right direction. Some of the defaults in Postgres can be more secure. For example, the first time I use a POSTGRES_PASSWORD to configure a password, changing this password involves more steps than just changing the values of the ENV, because it doesn't take the changed value there after.
Structured logging with JSON is going to improve a lot of debugging, again a great productive change.
Also, any idea when the docker image for Postgres 15 will be available?
Looking at the Alpine docker file(s) for Postgres you might be able to use the one for the release candidate and set en environment variable of PGVERSION=15.0 which should use https://ftp.postgresql.org/pub/source/v15.0/ here.
You would need to figure out what the package name is on Debian (if it even exists yet?) it's currently set to ENV PG_VERSION 15~rc2-1.pgdg110+1
PostGIS 3.3.0 mentioned another improvement in this release: "This version of PostGIS can utilize the faster GiST building support API introduced in PostgreSQL 15."
As an aside, I've been trying to learn basic GIS with PostGIS and QGIS and it's been quite frustrating. I had a dataset of roads which were broken up into short segments, which I wanted to merge back together based on a key. Theoretically that's a single simple operation, in practice it was getting hung up on something I couldn't understand and it took all afternoon. My usual practice of JIT doc reading wasn't working well, too many unfamiliar terms and missing fundamentals. If you have any recommendations for books or docs I'd love to hear them.
"PostGIS in Action" is a good option. Also check out https://locatepress.com/ which has a few more books in this niche.
> a dataset of roads which were broken up into short segments, which I wanted to merge back together based on a key. Theoretically that's a single simple operation
PostGIS provides the ST_MakeLine aggregate function for this, but you need to write the query such that the GROUP BY query retains the correct order. Creating a new line segment out of many line segments effectively means breaking the lines into their constituent points and then creating a new linestring based on the points. For things like GPS data, you can order by timestamp. But for other cases? You've got to write your aggregate query carefully so that adjacent line segments are actually meant to be merged.
Just some feedback for releases of any software: I think apt sources and repositories should be ready to go on launch and PR-release so people can immediately use the new version. Looks like that's going to take 2-3 days. Sources are available but that's not something most people want to delve in with make files and dependencies. Something like postgres is huge. Right now, if you go to downloads and expect postgresql-15 available, it is not; lot of people on IRC and elsewhere on Twitter are confused where to download postgresql-15. I know that takes time, so the PR release should just be delayed until apt sources are ready. May be also docker repositories.
> Just some feedback for releases of any software: I think apt sources and repositories should be ready to go on launch and PR-release so people can immediately use the new version.
Normally that's the case - we "wrap" the release on Monday so that packagers have time till Thursday to get packages ready. Looks like something didn't quite work out this time. Looking into what went wrong.
Part of it is that a list of supported versions on the windows, macos download pages weren't updated, despite the 15 being available. But unfortunately the Debian / Ubuntu packages are indeed not yet ready.
> May be also docker repositories
postgresql.org doesn't currently provide docker containers to my knowledge.
I'm a little bit confused on the "sorting perf improvements" bit. Does that mean that if I have a query with a `SORT BY`, it will literally "just be faster"? Surely that sounds too good to be true...?
To give credit where credit is due this was not just a Microsoft contribution. The four mentioned contributors were all from different companies: Microsoft, Dalibo, Greenplum and EnterpriseDB. Microsoft employs some of the core contributors of PostgreSQL, but many patches come out of cross company collaboration.
Could someone give some examples on their own domains where the MERGE command is a huge QOL improvement over what's currently available?
I see a lot of people being so very happy in the comments, and, well, I've tried to think long and hard about how to apply it to my current domain but was a bit at a loss... Maybe some practical examples can help?
WITH results as (
SELECT ...
),
inserted_rows AS (
INSERT ...
SELECT * FROM results
...
),
deleted_rows AS (
DELETE ...
USING results
...
)
UPDATE ...
FROM results
Glad to see all the new regex functions. I recently moved a database from AWS Redshift to Postgres on Heroku and was shocked to see how many functions like regexp_substr() weren't available. Wish this had come sooner so I didn't have to rewrite so many of my queries.
> Queries using SELECT DISTINCT can now be executed in parallel.
This sounds quite interesting, but I would assume it does not always work? I didn't see this mentioned in the linked documentation, does someone know when/how the parallel distinct works?
Couldn't tell you the when, but I can tell you the how is likely how you'd expect.
Generally speaking, to do distinct you need a dictionary to look up previously seen values. To do it in parallel you need to make that dictionary thread safe.
For Java, such a thread safe dictionary is made by segmenting the table and synchronizing on the segments. So you'd hash your values, figure out which segment that targets, lock that segment, and then read/update that segment to contain the new value.
I'd assume that postgres is doing a fairly similar trick, The only additional synchronization would be on a linked list of found values. In that case, you could either lock the list and update as new values come in, you could sort those values after the fact, or you could employ a lock free algorithm to add nodes to the list (see lock free queue implementations).
Most parallel operations in PG are implemented by simple merge the dataset, work independently and merge the results. I expect the new distinct to behave the same and not work on a shared data structure.
Thanks! May be helpful to include this in the documentation, since I guess it will then often depend on the numDistinctRows estimate [1] if the parallel plan is used.
Interestingly, there are no breaking changes that were required to be addressed by Hasura GraphQL Engine to support Postgres 15. Hasura is fully compatible with this release, with the potential of adding the MERGE command via the GraphQL API soon.
Excited about the incremental performance improvements and making more secure defaults by revoking CREATE permission for public schema for non-superusers.
I hope that someday it will become as popular as MySql. Although I see complicated, since many companies use other alternatives and in my experience it is complicated to make the migration when you have many years using the same technologies.
This is useful for schemas that use a denormalized tenant id across multiple tables, as might be common in a multi-tenant application:
This schema has a problem. When you delete a user, it will try to set both the tenant_id and author_id columns on the posts table to NULL: When we delete a user, we really only want to clear the author_id column in the posts table, and we want to leave the tenant_id column untouched. The feature I added is a small syntax extension to support doing exactly this. You can provide an explicit column list to the ON DELETE SET NULL / ON DELETE SET DEFAULT actions: I initially encountered this problem while converting a database to use composite primary keys in preparation for migrating to Citus [2], and it required adding custom triggers for every single foreign key we created. Now it can be handled entirely by Postgres![1]: https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV%3D...
[2]: https://www.citusdata.com/