| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eric4smith 1347 days ago

I had a database table where I used UUID as primary key. Big mistake. Haunts us to this day.

Not sortable. Takes a lot of space. Table relationships are annoying. Etc.

What we do instead is have a secondary UUID key and keep Bigint as primary keys. Then use the UUID column in the external context instead.

UUIDs are fine for 99.99999% of the time in your own domain.

Don’t expect universal uniqueness across all domains.

11 comments

dragonwriter 1347 days ago

The reason to use UUIDs as primary keys is to allow creating records including primary keys outside of the database before posting them, especially in distributed systems.

UUIDs are sortable, but don’t give you creation-order sorting (of course, its abusing bigint PKs to rely on them for that, too.) If you want creation-order sorting, storing a creation timestamp and sorting on that works, and I’ve never had a db that had a business requirement for creation order sorting and didn’t also have one for actual creation time.

Alupis 1347 days ago

I've become of the mindset that most records should have creation and update timestamps - even if you do not currently plan to use them. They might come in handy later, and the code to implement them in most languages/frameworks is trivial anyway.

theptip 1347 days ago

These days ULID (https://github.com/ulid/spec, or any of the variants), or most recently the new UUID versions (https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...) give you creation-order sorting.

sa46 1347 days ago

What's the benefit of having the primary key before the row is stored in the database?

Maybe concurrent inserts to multiple data stores so you don't need to wait for an initial ID from the database. You'd have to trust the client to give you a “good” UUID as well as the normal distributed problems of one of the RPCs failing. I know both Twitter (called Snowflake iirc) and Google have unique ID services for this use case.

dragonwriter 1347 days ago

> What’s the benefit of having the primary key before the row is stored in the database?

If I have a set of linked records in a relational schema representing a complex object, I can create them all with one round trip rather than multiple.

Also, large numbers of clients can insert in that way without contention, whereas if you use a sequence generator for PKs, it becomes a resource around which there is contention when creating rows.

> You’d have to trust the client to give you a “good” UUID

Sure, this lets you scale, e.g., backend service instances (which are db clients) without (as much as otherwise) contention in the db layer, its not usually something you would do with external, untrusted clients.

> I know both Twitter (called Snowflake iirc) and Google have unique ID services for this use case.

Snowflake was one of the inspirations for the newer (draft) UUID versions [0] (though, unlike them, it had a design constraint of fitting into 64 instead of 128 bits.)

[0] https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

sa46 1346 days ago

> I can create them all with one round trip rather than multiple.

I see. I'd lean towards using common table expressions. The first insert statement returns the primary key and other inserts can depend on the key. I do understand it's not a panacea and composing queries can be problematic.

> a sequence generator for PKs, it becomes a resource around which there is contention

As I understand, Postgres sequences don't block which leads to a different problem [1].

[1]: https://news.ycombinator.com/item?id=27843084

deathanatos 1346 days ago

> Not sortable

That's not correct; its trivial to come up with an ordering, and I don't know of a database in practice that doesn't permit sorting on a UUID.

> Table relationships are annoying

… in SQL,

  user_id REFERENCES users

It's exactly the same, regardless of the type of the column…?

> Takes a lot of space

Yes … but also no. It's 16 B vs. a serial's 4 B, I grant, but compared to a varchar, it's immaterial. (And particular in comparison to the number of times I see people use a varchar for an enum…) Certainly there could be a case where a row is wide b/c of UUIDs, but in practice, rows are wide either because of the data, or because of poor design.

Quarrelsome 1347 days ago

AFAIK UUIDs as pks in databases is extremely standard. I would suggest the biggest downside is debugging where writing id = 5 is much easier than id = 'xxxx-xxxx-xxxx-xxxx'.

vbezhenar 1347 days ago

On the other hand you won’t get wrong result if you wrote wrong table name. Was hit once. Deleted row from the wrong table.

Quarrelsome 1346 days ago

yeah, I seen bugs in multi-db systems where a dataset gets accidentally composed from data from each due to aligning ids. Was a very strange afternoon.

dvnguyen 1347 days ago

Just curious to learn: when do you need sortable primary keys?

winrid 1347 days ago

It makes pagination much more efficient and possible to completely use an index (re: data is pre-sorted) if you have some sort of increasing id, vs sorting in-memory on request. Even having some sort of indexed "createdAt" field, and sorting on that, isn't super reliable for pagination when the dataset is changing. You usually want a stable sort to make pagination nice, so you add an ID to it, but then this gets very inefficient w.r.t index use if the PK is a UUID since it is not monotonic/sortable.

deathanatos 1346 days ago

UUIDs are sortable, and you can do efficient pagination on them. (The results wouldn't be in any meaningful order, but that's orthogonal.)

winrid 1346 days ago

Yeah sorry I had a brain fart and was thinking of stable pagination w/ compound sort+filter. It doesn't matter if id is UUID, sigh.

Where date > y or (date eq y and id > x) order by date, id

This can't be completely fulfilled by an index. It'll have to sort/merge somevresults in memory, very CPU expensive.

More examples here https://www.mixmax.com/engineering/api-paging-built-the-righ...

code_runner 1347 days ago

In sqlserver (and others?) you can have one clustered index. Typically this is the primary key (doesn’t have to be though).

The clustered index is actually the physical order of the data stored on disk, so any new guid causes tremendous amounts of churn.

Not sure how modern this concern is, but I would guess a LOT of SASS companies have to consider this.

Quick edit: sequential GUIDs are obviously a thing and I believe alleviate a lot of these concerns. I do not believe that was an option in sql server 2008 r2 (a version that lived a LONG time).

butlerm 1346 days ago

A clustered index is certainly not actually the physical order of the data stored on the disk, at best it tends to approximate it by placing data associated with similar keys in similar blocks. You would have to actually physically reorganize your entire clustered table or index to get it in physical order, and often you lose performance by doing so.

The performance impact of randomly distributed keys on any ordered index on that key is certainly real though, so I agree with you there.

sicp-enjoyer 1347 days ago

Almost every database algorithm is based on ordering and binary search.

deathanatos 1347 days ago

… and for which UUIDs are sortable.

OP called them "not sortable", which is a poor phrasing, as they're sortable. What he's likely getting at is that they're not ordered by generation time, which matters to some folks.

postalrat 1347 days ago

When you need to get a sorted list or perhaps a range?

When do you need primary keys?

endisneigh 1347 days ago

that doesn't really make sense though. how would you know the range to sort by? presumably you'd use the criterion directly to sort.

a better reason and justification to not use UUIDs is that they take up needless space if you're space constrained. they also have worse performance

postalrat 1347 days ago

Maybe you are doing some sort of cursor for pagination or whatever. You can use one ID and fetch the next 50 records. Then use the last ID in those 50 to fetch the next 50, etc.

endisneigh 1347 days ago

but you can still do pagination with uuids, your rows can be sorted on any arbitrary criteria like created date or updated date.

vanviegen 1347 days ago

But that would be slow, even with an index, as compared to using a clustered primary key. Especially when your data doesn't fit in ram.

postalrat 1347 days ago

Dates might not be unique but I can see other possibilities that would be fine.

JSavageOne 1347 days ago

When is an instance you'd need to sort by ID? How does it make table relationships annoying?

biztos 1347 days ago

I’m not the parent but for me:

1. Sort by ID if you want to see things by insert order and are using serial/autoincrement for the PK.

2. Writing (or reading, or talking about) SQL queries in which the IDs are present.

I solved the first problem by using ULID at one point but in future I’d probably not use a UUID until I had a specific need for it. And as others have mentioned, if that need is about the outside world I’d probably go with an extra column.

BeefWellington 1346 days ago

> 1. Sort by ID if you want to see things by insert order and are using serial/autoincrement for the PK.

Most RDBMSes (possibly all of the big ones?) do this by default if you opt for no ORDER BY clause, however that IS then depending on undefined behavior which is bad.

Kamq 1347 days ago

> Takes a lot of space

Wait, did you store them as strings instead of as a native UUID type (or any sort of 128 bit integer type)?

Alupis 1347 days ago

Storing UUID's in their binary format quickly becomes a PITA for debugging or doing manual queries. Some DB's have functions that can convert to human-readable strings which make this less painful, but it's never going to be as simple as using integer id's.

As with all things, there are tradeoffs that must be considered before building the system.

sgtfrankieboy 1346 days ago

Which DB doesn't support UUID as a type so I can make sure to never use it?

Alupis 1346 days ago

More than you'd think... including the venerable MySQL.

Alupis 1347 days ago

This is a good approach. I wonder if using a UUID-aware data type (like PostgreSQL's UUID type) would improve performance without making the second column necessary?

smileysteve 1347 days ago

But to use both, you have to store both, to use it as a lookup, you have to [unique] index both; you can no longer easily partition (and guarantee uniqueness)

And an auto incrementing bigint doesn't guarantee order.

SAI_Peregrinus 1347 days ago

And if you need universal uniqueness across all domains, just pick a 256-bit random number using a CSPRNG like /dev/urandom.

sicp-enjoyer 1347 days ago

I'm surprised to hear it's not sortable. Why can't they just memcmp it?

deathanatos 1347 days ago

You can and databases do.

What people want is for it to be sortable by generation time, i.e., they want it to share the "larger values were generated later" property of timestamps (but without collisions) or SERIALs (but without the locking on the serial / independent generation).

sicp-enjoyer 1346 days ago

Got it. Thanks for correction.

lyu07282 1347 days ago

Everytime I used uuids before I always ended up with terrible index performance