Hacker News new | ask | show | jobs
by acmecorps 1462 days ago
You’re not alone. I’ve been migrating my tables to use uuid instead of integers and have been using uuid whenever I have new tables, unless I have very good reason not to. Experience was my teacher.
1 comments

Don’t UUIDs as primary keys totally destroy the performance because UUIDs aren’t sortable and thus wreak havoc with the index for the primary key?
Destroy is a strong word (and UUIDs can certainly be sorted, but locality is an issue), but all of software is a series of tradeoffs. I've used both auto-increment and UUIDs, and wish I had used UUIDs in almost every case.

Distributed generation - no sending a record to a server to get a key then using it to generate other records. In a world with increasing use of services, this becomes more important every day.

System wide unique - helps with logging, debugging, and avoiding general errors.

Multi-master db replication - I know this depends on the RDBMS, but having a unique key on every record avoids clashes. Also super useful during data migrations (which will happen. I have another rant that data always outlives code, so plan accordingly).

Validation - UUIDs have a form that can be a first level validation on input.

For me, those advantages outweigh some extra space usage, possible performance impact, and ugly URLs.

And on performance, if it's determined that it is an issue because of using UUIDs there are ways to make them more index friendly.

There is a perf hit. You can not help it when you are slugging that many bytes around. An int/int64 fits into a register easy and the instruction for comparison is cheap. However, when you add in replication and other items UUID becomes more desirable due to the property having extra information embedded into it to make them unique. You can get some of the same benefits with more complexity if you are using auto increment. Such as using a stride that is similar to the number of machines you have in the cluster. But even that can get weird, and depending on your db can be a pain to setup. Using them as a cluster index and you probably will most certainly create a hotspot index and poor lookup performance, but decent write. Just due to the fact that items probably are not grouped the same as your where clause. SSD's have hidden most of this for most databases. But the issues are the same.
They don't utterly destroy performance, but there is some hit if you use UUID v4 random values due to database index scattering. That's why this new proposal exists. It adds new versions of UUID that are mostly incrementing in time, so that they group better.
In my experience, thats a big NO. While there is some performance impact, in other ways it can actually be a benefit (spreading writes out to multiple pages, for example) so as with anything in regards to performance: benchmark, benchmark, benchmark.

PS.: UUID's also work quite nicely to deal with certain system hiccups, specifically temporal ones. Could be thought of as something similar to the Erlang supervisor trees.

CUID might be a good alternative if you need something sortable over time: https://github.com/ericelliott/cuid

It's mentioned in this IETF draft, but I don't see the analysis made available