Hacker News new | ask | show | jobs
by zetalyrae 855 days ago
The point about the storage size of UUID columns is unconvincing. 128 bits vs. 64 bits doesn't matter much when the table has five other columns.

A much more salient concern for me is performance. UUIDv4 is widely supported but is completely random, which is not ideal for index performance. UUIDv7[0] is closer to Snowflake[1] and has some temporal locality but is less widely implemented.

There's an orthogonal approach which is using bigserial and encrypting the keys: https://github.com/abevoelker/gfc64

But this means 1) you can't rotate the secret and 2) if it's ever leaked everyone can now Fermi-estimate your table sizes.

Having separate public and internal IDs seems both tedious and sacrifices performance (if the public-facing ID is a UUIDv4).

I think UUIDv7 is the solution that checks the most boxes.

[0]: https://uuid7.com/

[1]: https://en.wikipedia.org/wiki/Snowflake_ID

8 comments

> The point about the storage size of UUID columns is unconvincing. 128 bits vs. 64 bits doesn't matter much when the table has five other columns.

But it's not just the size of that one column, it's also the size of all the places that id is used as a FK and the indexes that may be needed on those FK columns. Think about something like a user id that might be referenced by dozens or even hundreds of FKs throughout your database.

...and this has not just a size impact but also a substantial performance impact.
> 128 bits vs. 64 bits doesn't matter much when the table has five other columns.

!!!!

But those 5 other columns are not indexed.

---

There are three levels of database performance:

1. Indices and data fit in memory.

2. Indices fits in memory, data does not.

3. Neither indices not data fit in memory.

If you can do #1 great, but if you don't have that, fight like a madman for #2.

---

Doubling your index sizes is just makes it harder.

think of the primary keys in a database like typedef void* ie it's your fundamental pointer and the size of it will impact every aspect of performance throughout - memory/disk footprint and corresponding throughput bottlenecks, cpu time comparing keys which is what every operation reduces to in the deepest inner-most loops of joins and lookups etc.

when x86-64 cpus were new the performance impact from switching to 64-bit pointers was so bad we had to create x32/ilp32 and the reason .NET still has "prefer 32-bit" as a default even today.

using 128-bit uuids as PKs in a database is an awful mistake

Prefer 32-bit does nothing for modern .NET targets. This is actually the first time I've heard the term being used in many years, even back in .NET Framework 4.6.x days it wasn't much of a concern - the code would be executed with 64-bit runtime as a default on appropriate hosts.
the 32bitpref corflag isn't part of a .net core target since those are always il, it is more properly a runtime concern as it should be

it's still the default in .net as of 4.8.1 (has been since it was introduced in 4.5 roughly coinciding w/java's pointer compression feature which is also still the default today)

The .NET Framework target might as well not exist :D

Many libraries are straight up dropping NS2.0 TFM and overall community tends to perceive the request to still target it very negatively. Sure, Visual Studio still runs some in-process features on top of it, and so do Office add-ins, but it's an active work-in-progress to get rid of any new code having to be written while still targeting it.

So, in that regard, this setting does not matter.

in .net core there's no setting, you target the x32 abi by choosing a runtime or target platform ending in "32" or "x86"
The v7 isn’t a silver bullet. In many cases you don’t want to leak the creation time of a resource. E.g. you want to upload a video a month before making it public to your audience without them knowing.
> There's an orthogonal approach which is using bigserial and encrypting the keys...

Another variant of this approach: https://pgxn.org/dist/permuteseq/

It is also feasible to encrypt the value on display (when placing it in URLs, emails, &c):

https://wiki.postgresql.org/wiki/Pseudo_encrypt

This maintains many of the benefits of sequential indexes and does allow you to change the key. However, if the key is changed, it would break any bookmarks, invalidate anything sent in older emails -- it would have the same effect as renaming everything.

It very much does when you have a ton of FKs (enforced or not) using such a column, and thus indexed and used in many joins. Making it twice as hard for the hot part of an index to fit to RAM is never good for performance, nor for the cloud bill.

If you have a column that is used in many joins, there are performance reasons to make it as compact as possible (but not smaller).

If I’ve learned anything in my 7 years of software development it’s that this kind of expertise is just “blah blah blah” that will get you fired. Just make the system work. This amount of trying to anticipate problems will just screw you up. I seriously can’t imagine a situation where knowing this would actually improve the performance noticeably.
Would it ever make sense to have a uuidv7 as primary key but then anther slug field for a public-id, e.g. one that is shorter and better in a url or even allowing user to customize it?
Yes sure but now you have to handle two ids and guaranteeing uniqueness across machines or clusters becomes hard.
That and a uuid is going to be unique across all tables and objects, whereas a slug will only be unique within a certain subset e.g. users within an organization. I’ve seen a production issue IRL where someone (definitely not me) wrote a query fetching objects by slug and forgot to include the ‘AND parent_slug = xxx’