Hacker News new | ask | show | jobs
by smoyer 1619 days ago
"Bad for performance as primary keys" -> "Bad for performance as primary keys in MySQL". This isn't an issue in PostgreSQL and perhaps the lesson here is that as you scale, you need to understand more about the internals of the DB system you've chosen. This isn't limited to RDBMS as it's pretty easy to show trade-offs in choosing a NoSQL as well.
4 comments

You are right, the internals are important. When I tested this, a long time ago [1] [2], I found that randomly distributed keys on PostgreSQL were indeed faster than e.g. on MySQL. Which surprised me! I still don't quite understand why. But, even with PostgreSQL, sequential (or nearly sequential) are much faster.

[1] https://markmail.org/thread/3jzqjy6cavxcrpbq [2] https://markmail.org/download.xqy?id=3jzqjy6cavxcrpbq&number...

> Which surprised me! I still don't quite understand why.

Because MySQL (specifically innodb) will cluster by the PK by default, while Posgres will not.

Meaning in MySQL, by default, the table will be stored in PK order, so when you insert an UUID at a random location any record which sorts after it has to be moved to make room, for nothing since UUID ordering is not really relevant (at least for the current standard UUIDs).

In pg, clustering is not a property (at least for now) it's an explicit point operation, which is not performed by default. So when you insert a row whether UUID or INTEGER-keyed, it's just tacked on wherever there's free space. The UUID still has to be inserted at the right place in the index' btree which can lead to a lot of cascading changes, and it's more expensive to compute, store, and load than a simple integer (especially 32b), but it doesn't induce an arbitrarily high amount of shuffling in the storage layer of the table itself.

While the problem on the article is less of an issue in Postgres (the indexing cache locality is still there), they are still slower than the serial ones.

I don't know if you save enough problems by using them as alternative keys in Postgres for it to be faster, my guess is that just using them as primary key would be faster than a serial primary key and an UUID alternative one. Still, UUIDs are much more useful as client-facing data than as a pure database value, so I would also do that (and standardize my PK format), probably paying a performance penalty.

"Slower" is a matter of context. Like with most things, there are trade offs to allow for more concurrency. UUIDs are intended to avoid serialization with ID generation, which makes sense in a federated system. Yes, they take up more space and aren't all neatly ordered, but the trade off works out well for the right kind of systems (FlakeIDs are a better trade off for more contexts). The problem this article is describing is you're getting the worst of both worlds, because you're still serializing in the MySQL database...
They are still generally found to be slower to use than sequential primary keys in postgresql.
There are remedies for that while still keeping the advantages of UUIDs. https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...
What if you convert from MySql to postgres? Will the uuids still benefit from postgres optimizations?