Hacker News new | ask | show | jobs
by paledot 2088 days ago
The different strategies are interesting, but I don't understand the bit flags. Why would you want your ID to include a record of how it was generated? Some use cases go out of their way to obfuscate that information, and one selling point of UUID over auto increment is that you can't infer missing items or how many are being created by watching PKs.

It's cool that I can make deterministic UUIDs, but it seems silly that the spec should require me to expose the fact that it's deterministic. It's not like my deterministic UUID is any more likely to collide with someone else's random one than two deterministic or two random are to collide with each other.

1 comments

I don't know the “real” historical reason, but the deterministic UUID versions aren't really for the case where you want to obfuscate ID generation. They're more useful for shoehorning existing database-key-like information into slots where UUIDs are expected. In fact, I very recently ran into a thus-far-hypothetical situation with a set of in-memory objects identified and indexed by UUID, where “real” such objects have UUIDs that are also lookup keys for information elsewhere, and where I wanted to be able to deterministically derive a secondary, distinct “fake” object from a “real” one. My mental design sketch used SHA-1 UUIDs for this. (Sorry for handwaving over the specifics.) Though I do think the digest-based UUID versions are noncentral, especially in greenfield.

From another angle, if you look at the difference between version 1 and version 2 time+node UUIDs, they're actually quite similar except for version 2 having a UID/GID-like field replacing some of the least significant bits of the other data. So collisions between v1 and v2 would be much more possible without the version bits. And at that point there's no reason not to use similar flags to refer to other generation methods; even if in theory you could decide that they're unlikely to create collisions, that just feels like it leaves the door open for edge cases and having to remember which of those decisions were made. Why not be consistent?

From the opposite angle, in a case where you're using random UUIDs generated within a trust boundary as lookup keys, the knowledge that one of them is random doesn't tell an adversary on the outside anything useful about the others, because they're all random, so it's a few harmless “wasted” bits. Of course, since UUIDs didn't have “unguessable” as a selling point to start with, if that's something you're leaning on, then go ahead and use something else…