Hacker News new | ask | show | jobs
by masklinn 2950 days ago
> Is there anyone who can go a little bit more in detail?

* UUIDs are way more painful than serials to recognise, remember, input or transmit especially if you're not dealing with huge tables. "18574" is easy to read/grok, "21caeffa-0fca-4f4e-b845-46ef0576e42a" is not.

* UUID are 128 bit instead of 32 for most serial PKs by default, this may or may not matter. Note that this doesn't just impact the table itself (lowering data locality and thus performanes: less stuff fits into caches) but also any FK as well as wire transmission (where the size explodes as you're going to transmit a hex version of the UUID so always at least 32 bytes, a decimalised u32 maxes out at 10 bytes).

* because UUID are random data they're intrinsically non-ordered (as opposed to serial ids which… are), this means your writes are all over the place and clustering is defeated, this adversely impacts your reads and writes in SQL dbs (some other DB techs especially distributed ones prefer the scattering: https://news.ycombinator.com/item?id=14524174)

The latter can he mitigated by using "ordered UUIDs": you can generate UUID1 (nominally time-based) such that the final value has a sequential "head" and a random "tail", either by taking control over the process or by generating a regular UUID1 and rejiggering it a bit: https://stackoverflow.com/questions/412341/how-should-i-stor...

3 comments

> UUIDs are way more painful than serials to recognise, remember, input or transmit...

Completely agree, but I've found this can be a non-technical "feature" too. Serial integer primary keys are much more susceptible to human error when doing any sort of direct database manipulation.

Make a typo on a integer PK? Wrong user gets deleted. UUID typo? Row not found (almost certainly).

Another source of error I've seen is when someone in sales asks "Hey, can you remove User #1234?", but they really meant Customer #1234." With UUIDs, there's no "collision" between the tables.

Clearly there are better process/tool-based ways to prevent these types of mistakes, but it's a useful side effect of UUIDs.

You can somewhat mitigate the typo problem with integers by encoding them to the outside with parity included.

An 8 bit parity should be able to easily tell any possible typo in a 32 or 64bit integer and even correct errors. You could even put the parity into the lowest 8 bit of the integer.

I'm currently working on this for a project of mine to not only prevent typos but tolerate them by using the parity and using letters that are far apart on my QWERTZ keyboard (bit local but it should work for most keyboard sets and I have parity to fall back on)

Yes but that's a full UUID, using a 64bit integer with some encoding is far shorter.
> UUIDs are way more painful than serials to recognise, remember, input or transmit especially if you're not dealing with huge tables. "18574" is easy to read/grok, "21caeffa-0fca-4f4e-b845-46ef0576e42a" is not.

One of the reasons we use them is because theyre not easily recognized or sequential.

> The latter can he mitigated by using "ordered UUIDs": you can generate UUID1 (nominally time-based) such that the final value has a sequential "head" and a random "tail"

Specifically, in PostgreSQL such UUIDs can be generated using uuid_generate_v1mc()

you can use uuid_to_bin for this, check the docs: https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functi...