Hacker News new | ask | show | jobs
by floormat 4795 days ago
Sorry for the up coming stupid question but...

What is the value of this? Why can't unique identification be done using just regular increments on an ID column? Or even a composite key?

5 comments

As pallinder said, it can be very handy: the IDs can be generated by the nodes, not the db server. Very useful in disconnected environments. Imagine being able to create data on a smartphone whilst sitting on the plane, and not having to do anything messy with ID replacement when you sync with the server in the office.

The cost? The keys are larger, and (unless using a sequential algorithm) are poor candidates for clustered keys (because they force page splits). The impact can be rather large (this lead to terrible performance in early versions of Sharepoint).

To be clear, MAC-based UUIDs are sequential. PRNG (v4) UUIDs are random and don't work well with indexes.
It's also useful if you don't want your URLs to expose how many of a certain thing you have, whether it be users, posts, payments etc. A lot of sites let you derive how much activity they have based on how fast their numeric IDs increment. You could use a separate token alongside the pkey to do the same but this feature just makes it simpler.
Only downside: this makes the URLs very long sometimes.
Imagine a distributed system where you want to preserve uniqueness across the board. Using a uuid more or less (by the sheer number of possibilities) guarantees that this will be the case.
UUID v1 uses the MAC address of the computer doing the generation as a part of the UUID, which ensures uniqueness so long as you aren't cloning MAC addresses in your infrastructure.

The downside of this is that it can leak information about the machine that generated the UUID, but if you require deterministic uniqueness, there you go.

Another not mentioned here: sharding becomes much easier. You don't need a central authority that controls the increment.
At the DB level, it facilitates master-master replication set-up, eliminating the auto_increment collision problems. Master-master replication, in turn, allows building distributed applications that can handle net splits reasonably well.