| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hot_gril 797 days ago
	UUIDs should not be used as database primary keys unless the DBMS recommends it or you have a well-studied special reason for it. Postgres and MySQL are meant to use bigserial by default, even Citus. Some special sharded DBMSes like Spanner need non-sequential pkeys, but even Spanner explicitly tells you to use uuid4 because k-sortable keys cause hotspotting: https://cloud.google.com/spanner/docs/schema-design#uuid_pri...

3 comments

bruce511 796 days ago

I understand the performance implications of using a UUID for a primary key. And if performance is your primary concern, then this is good advice for large tables.

But if I could go back 25 years and only give myself one bit of advice, it would be to use UUIDs as the primary key. Because in a different context to raw performance, it offers a lot of advantages.

While there are advantages in numerous areas, I'll focus on one for this post. The area of distributed data.

We started by running a database on prem. Each branch or store got their own db. 15 years later always-on networking happened. 15 years after that, all businesses have fibre.

So now all the branches use a giant shared online database. With merged data. Uuid based this task would be trivial. Bigint based, yeah, it's not.

Along the same timeline data started escaping from our database. It would go to a phone, travel around a bit, change, get new records, then come home. Think lots of sales folk, in places without reception, doing stuff.

So you're right in the context of a single database (cluster) which encompasses all the data all the time.

But in the context where data lives beyond the database, using uuids solves a lot of problems.

There are other places as well where uuids shine.

So as with most advice when it comes to SQL, I'd add "context matters".

link

hot_gril 796 days ago

When data lives beyond the database, you need a uuid, but it doesn't need to be your pkey. Even your typical backend-frontend app with a single DB will often send uuids over the API.

If you're copying a DB, mutating, then merging back in, you just have to reset the bigint pkeys. I can see how in some contexts that might be less convenient (or if merges are very frequent and reads are not, less performant), but that's a special case and not something to assume from the start. For example I've done merges like this before pretty easily with bigints, and I've also been in places where they start out with uuids pkeys then never benefit.

link

bruce511 796 days ago

Bearing in mind that primary key, and clustered key are not necessarily the same thing, your point stands that the uuid does not need to be the clustered key.

Renumbering bigint primary keys, so as the effect a one-time merge, becomes substantially less trivial if the desire for minimal downtime, coupled with hundreds of related tables, and tens of sites are in play.

link

hot_gril 796 days ago

Yeah, I can see that

link

perfectspiral 796 days ago

How do you know it would have worked out better with UUIDs? Did you load test it? What's the size of your dataset?

link

bruce511 796 days ago

With bigint primary keys the process starts either taking the old site offline, and ends with bringing the new site online.

In-between is a non-trivial renumbering step, which takes measurable time that invalidates all existing backups.

By contrast uuid based databases do not need this step, and all existing data (some steady distributed, some in backups etc) remain valid.

link

j16sdiz 796 days ago

UUID primary key remove hotspots; Sequence primary key increase locality.

Depends on your access pattern, you may prefer the other way, even on the same DBMS.

link

hot_gril 796 days ago

Yes, but that decision has to be well-researched.

link

stephenr 795 days ago

I can't speak for PG but MySQL at least has a built in function to resolve the time ordering issue when storing v1 UUIDs (and a corresponding function to restore them to a valid UUID).

link