Hacker News new | ask | show | jobs
by itslennysfault 870 days ago
I would never advise this. I use UUIDv4 for basically everything. It adds minimal overhead to small systems and adds HUGE benefits if/when you need to scale. If you need to sort by creation date use a "created" column (or UUIDv7 if appropriate).

If your system ever becomes distributed you will sing the praises of whoever choose UUID over an int ID, and if it never becomes distributed UUID won't hurt you.

Note: this is for web systems. If it's embedded systems then the overhead starts to matter and the usefulness of UUID is probably nil.

2 comments

It is worth mentioning that the reason UUIDv4 is strictly forbidden in some large decentralized systems is the myriad cases of collisions because the "random number" wasn't quite as random as people thought it was. Far too many cases of people not using a cryptographically strong RNG, both unwittingly or out of ignorance that they need to.

Less of an issue if you have total control of the operational environment and code base, but that is not always the case.

How does this happen? Are people implementing UUIDv4 themselves using rand() or equivalent? Or has widely used UUIDv4 libraries had such bugs?
It comes in a couple common flavors. Most commonly it is people just rolling their own implementation and using a PRNG or similar. Not every environment has a ready-made UUIDv4 implementation, and not all UUIDv4 implementations in the wild are strict. A rarer horror story I've heard a couple times is discovering that the strong RNG provided by their environment is broken in some way. Both of these cases are particularly problematic because they are difficult to detect operationally until something goes horribly wrong.

The main reason non-probabilistic UUID-like types are used for high-reliability environments is that it is easy to verify the correctness of the operational implementation. It isn't that difficult to deterministically generate globally unique keys in a distributed system unless you have extremely unusual requirements.

It adds a lot of overhead at any scale, it’s just that the overhead is hidden due to the absurd speed of modern hardware.

I’ll again point out (I said this elsewhere in a post today on UUIDs) that PlanetScale uses int PKs internally. [0] That is a MASSIVE distributed system, working flawlessly with integers as keys. They absolutely can scale, it just requires more thoughtful data modeling and queries.

[0]: https://github.com/planetscale/discussion/discussions/366

GitHub also uses int PKs and has over 100,000,000 users.