"Bad for performance as primary keys" -> "Bad for performance as primary keys in MySQL". This isn't an issue in PostgreSQL and perhaps the lesson here is that as you scale, you need to understand more about the internals of the DB system you've chosen. This isn't limited to RDBMS as it's pretty easy to show trade-offs in choosing a NoSQL as well.
You are right, the internals are important. When I tested this, a long time ago [1] [2], I found that randomly distributed keys on PostgreSQL were indeed faster than e.g. on MySQL. Which surprised me! I still don't quite understand why. But, even with PostgreSQL, sequential (or nearly sequential) are much faster.
> Which surprised me! I still don't quite understand why.
Because MySQL (specifically innodb) will cluster by the PK by default, while Posgres will not.
Meaning in MySQL, by default, the table will be stored in PK order, so when you insert an UUID at a random location any record which sorts after it has to be moved to make room, for nothing since UUID ordering is not really relevant (at least for the current standard UUIDs).
In pg, clustering is not a property (at least for now) it's an explicit point operation, which is not performed by default. So when you insert a row whether UUID or INTEGER-keyed, it's just tacked on wherever there's free space. The UUID still has to be inserted at the right place in the index' btree which can lead to a lot of cascading changes, and it's more expensive to compute, store, and load than a simple integer (especially 32b), but it doesn't induce an arbitrarily high amount of shuffling in the storage layer of the table itself.
While the problem on the article is less of an issue in Postgres (the indexing cache locality is still there), they are still slower than the serial ones.
I don't know if you save enough problems by using them as alternative keys in Postgres for it to be faster, my guess is that just using them as primary key would be faster than a serial primary key and an UUID alternative one. Still, UUIDs are much more useful as client-facing data than as a pure database value, so I would also do that (and standardize my PK format), probably paying a performance penalty.
"Slower" is a matter of context. Like with most things, there are trade offs to allow for more concurrency. UUIDs are intended to avoid serialization with ID generation, which makes sense in a federated system. Yes, they take up more space and aren't all neatly ordered, but the trade off works out well for the right kind of systems (FlakeIDs are a better trade off for more contexts). The problem this article is describing is you're getting the worst of both worlds, because you're still serializing in the MySQL database...
this is what i do, any public facing id - url, or js/html that would show an id would be a uuid and then the backend queries are done with primary keys / integers.
Yeah. The main problem is people using them as primary keys in naïve systems like relational databases. You can't just expect a relational database to magically become a distributed system just by using UUIDs. There is a bit more work to do than that.
People use UUIDs for more reasons than just making things distributed. You can generate them client side if that's advantageous, you prevent leaking information about how many records there are in the system and prevent guessing of other potential PKs and potential unauthorized access, and there's some optimization strategies that benefit from not relying on a serial PK.
If you're obfuscating, then you should be using a DHT to map surrogate keys to internal keys that are more convenient for use in a MySQL index... or just stop using MySQL.
I've had occasional uses for it in systems that aren't distributed at all. It's a handy property of UUIDs, if UUIDs are useful for a particular use case, I'm going to use them, what they're 'supposed' to be for is irrelevant.
Somebody thought RabbitMQ did load balancing because it had three redundant nodes. Pretty much ruined the company. Definitely caused all of us to lose our equity.
Calling relational databases (HN’s preferred storage system) naive probably sounds like trolling to most people. There are also plenty of distributed relational databases. People downvote comments that sounds like trolling or flamebait.
I use UUIDs but I don’t know why they would magically make my Postgres a distributed system. I like them because the client can generate them offline.
What? Really? Naïve in this context means a general purpose solution that doesn't "know" about your use case. Have people never heard of a naïve algorithm or solution?
Distributed relational databases aren't naïve in this context. MySQL is.
> Have people never heard of a naïve algorithm or solution?
I have a fairly recent PhD in algorithms and I haven't heard naïve used this way. When I hear naïve, it usually just means "does the immediately obvious thing".
For what you're trying to say, the term I'm familiar with is "oblivious", e.g. "oblivious routing" or "oblivious local search", occasionally with modifiers such as "cache-oblivious".