Hacker News new | ask | show | jobs
by js2 1876 days ago
I'd advise UUID v6 over this which is at least an RFC 4122 extension. As coded, this isn't UUID compatible other than being 128 bits.

http://gh.peabody.io/uuidv6/

Also some recent similar submissions:

Timeflake is a 128-bit, roughly-ordered, URL-safe UUID.

https://news.ycombinator.com/item?id=25870482

https://github.com/anthonynsimon/timeflake

ULIDs:

https://news.ycombinator.com/item?id=18768909

Sonyflake:

https://news.ycombinator.com/item?id=25592325

KSUIDs (can't find any discussion here):

https://github.com/segmentio/ksuid

This comment lists other prior art:

https://github.com/bradleypeabody/gouuidv6/issues/3

1 comments

Is there a reason or need to be UUID compatible? I honestly don't know.

I use them in databases and know they're pretty safe to use when integrating data across multiple databases because collisions are astronomically unlikely, if implemented properly.

So here's a real-world use case that having an RFC 4122 UUID was useful for me.

I have a server which accepts reports and stores them in S3. For each new report, a v4 UUID is generated that that is used as the base of the S3 object name. This UUID becomes the report ID.

An entire system has been built around the report ID, expecting a hex UUID. Recently, I needed to change how the objects are stored in S3 in order to partition them into multiple S3 prefixes. Instead of storing every object in a single place in the S3 bucket, I needed to do something like:

    "reports/%s/%s" % (report_id[:2], report_id[2:])
The issue was that I had one component that was writing the reports, and a second component reading the reports. And somehow, the reader needed to know whether a report was stored using the old path layout:

    "reports/%s" % (report_id,)
Or the new path layout. I took advantage of the fact that RFC 4122 UUIDs have four bits set aside for version. After generating a v4 UUID, I update its version to 5. The reader can then check the report UUID version to know which path layout to use. Once all the reports stored using the old path layout expire, I can undo this hack.

Of course, I could have made the reader try the new path layout, then fall back to the old path layout. Or I could have updated the entire system to have a better way of communicating the path layout, but that would've been less efficient or have meant touching a lot more code.

I guess the moral of the story is: you never know when you're going to need to change something in a backwards compatible way, and having a few bits set aside even for something as simple as an object ID can be useful. I'm fortunate the UUID designers thought of that.

The version bits are for the UUID version though, not for your own versioning use. It doesn't sound like your v5 UUID was actually a UUID v5 (with namespace), so are your new UUID's even RFC 4122 compliant anymore?

At this point, you could just use one of these not-a-UUID replacements and include a few versioning bits of your own. Of course, you'd have to plan it in advance, my point is you used a lucky hack, not a feature of UUID's.

If I hadn't been using RFC 4122 UUIDs, I wouldn't have had any reserved bits to work with. There are a bunch of places in this system that restricted the report ID to be a hex string of length 32 that would have needed updating if I switched to a different format.

There's also no functional difference between a v4 UUID and a v5 UUID with fixed namespace and random bits for the name portion, so I disagree what I'm doing is no longer RFC 4122 compliant.

It's lucky I used RFC 4122 UUIDs in the first place, that I could take advantage of the designers foresight to set aside some extra bits.

That was very interesting, thank you. I didn't realize some uuids were anything other than... unique. A version seems very useful.
In broader industry these days, UUID effectively means “128-bit unique identifier” with no other standardization implied. I’ve seen dozens of custom “UUID” designs with no public description in the wild used at massive scales. A major reason for this is that the classic standard UUIDs v1-v5 have a broken design for some use cases so companies invent their own equivalents.

There is nothing wrong with designing a custom pseudo-UUID, and in fact there are often real advantages. The caveat is that you will want to use the same scheme consistently.

I have stored a wide manner of arbitrary data in UUIDs, and had great success with that technique.