Hacker News new | ask | show | jobs
by erik_seaberg 1259 days ago
A lot of distributed systems’ clocks can’t tell you within 1 ms when an event happened, leading to assumptions that aren’t true. I don’t think we should commit to IDs being even partially ordered unless we have a set of monotonic clocks with sequence number generators, and record a timestamp, a sequence number, and which clock they came from.
1 comments

It of course depends on your application needs. Many applications don't need a total order, they just need a predictable, stable order. ULIDs offer that. ULIDs generated within the same clock ms are sorted randomly. Random is unpredictable to people, but to a system that is a predictable sort order (it is randomly sorted) and the random numbers in ULID are more importantly still a stable sort order.

Obviously, applications exist where you do need a total order of IDs/events and ULID may not be the best choice for those, but don't underestimate the usage scenarios of partially ordered, stable sorts.

People want to believe that sorting ULIDs is semantically different than sorting UUIDs, and it’s really hard not to depend on something that looks sort of true, so I wouldn’t want a system to use them unless our architecture guarantees they always (not just mostly) monotonically increase.
ULIDs are sorted semantically different from UUIDs. UUIDs have 6+ different sort orders depending on who you ask and which library call you make. All of those sort orders are different from what you'd get just string forms of UUIDs. There are cross-platform sorting endian issues due to the struct order that GUIDs were originally designed to be grouped by.

ULIDs have a single, consistent sort that matches both byte patterns and string representation. That's a huge semantic difference.

Sure, ULIDs make no claims to accurate sorting or total ordering or monoticity beyond a single machine, but ULIDs aren't designed to be a Snowflake/Thrift replacement, they are designed to be a UUID replacement. You are correct that they make no more guarantees than UUIDs, but they don't have to, that was out of scope of their design. I can understand how that makes it less useful for some of your applications, but that doesn't make it not useful for all sort of applications. (Including many applications that once used UUIDs successfully but want something with a cleaner string representation and fewer cross-platform sorting headaches.)

ulid1 < ulid2 does not reliably tell you anything about the ordering of events one and two (without careful architecture most systems don’t have), but people read the spec and want to believe that it does. I think this makes the provisions for sorting more misleading than helpful.

It would help if the spec talked about clock skew and other issues to design and test for; ordering has a heavy cost.

You still seem to be assuming that the only reason to sort IDs is a total ordering of events. There are plenty of other reasons someone may want a good enough partial order of events or a sortable ID of things that aren't events in a system.

It might help if the spec mentioned things like clock skew and distributed time keeping, but mostly just to state that those problems are out of scope of the spec itself. You are right, ULID itself wasn't designed for those specific classes of problems. I think you are just too easily dismissing the classes of problems that ULID does solve reliably well.

I admit I don’t see the use case for IDs that are mostly ascending.
This is a good point. Something is either reliably sortable or it is not sortable at all.

To give the appearance of being sortable to those who are less familiar with how they are generated is potentially dangerous and misleading. And whilst it is true to say that these IDs will consistently sort in the same order, that is equally true of standard GUIDs etc - the difference being the latter does not lead people to believe that the order has inherent meaning, which the former does.

It's a little similar to how the designers of Go noticed that people were relying on the ordering of keys in maps matching the order items were added. So range iteration over keys was specifically changed to start from a 'random' point in the sequence (not truly random, but enough to stop people relying on it). They understood that the appearance of consistency without the fact of consistency leads to errors.

They have their uses I'm sure, they just need to be carefully considered and clearly understood uses.

> And whilst it is true to say that these IDs will consistently sort in the same order, that is equally true of standard GUIDs etc

GUIDs do not have a consistent sort order: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=10...

> the difference being the latter does not lead people to believe that the order has inherent meaning, which the former does.

That's mostly an accident of v4 UUIDs becoming the most common. v1 GUIDs had plenty of metadata with lots of semantic data in them that had all sorts of inherent meaning in the sort orders. That's part of why there are so many different ways to sort UUIDs in the first place (with again the above blog post being one of the best discussions on it). v1 GUIDs had machine identifiers and tried to be total orderable in distributed systems. It didn't work out well, there were privacy issues with the machine identifiers, and everyone switched to random v4 UUIDs. (That's relevant in discussions of the v6+ UUIDs which are trying to get back to partially semantically meaningful UUIDs again.)

The obvious counter-example to the Go example to me is Python: python dict (which also powers most "ordinary" python objects) had years of implementation behavior where keys were generally sorted lexicographically. It was undocumented behavior for many years and expected to change if the underlying implementation changed and people were directed towards SortedDict if they needed a dict with a reliable sort.

Python eventually decided to just document the implementation behavior and make it assumed that dict will generally behave with sorted keys, but also still suggests SortedDict for apps that need that guarantee of a total order. Reliable partial order is "good enough" for a lot of programs and a lot of algorithms.

> Something is either reliably sortable or it is not sortable at all.

ULIDs are reliably sortable, they just aren't a total order with respect to monotonic time and distributed clocks. I'm not discounting that there are needs for monotonic time and distributed clock coordination and such not, but I do think it's a bridge much further down from what ULIDs are trying to do.

ULIDs are "UI stable" for sorting: if you've got a UI that shows data as it comes in and sorts by ULID keys the amount of "UI jiggle" is about as low as possible. You are still going to get "middle inserts" in between previous loaded items in a way that a user will sometimes notice, but not in a way that thrashes the UI or drastically changes the viewport. That's a sometimes useful property. (For more things than just UIs, but UIs are easy to visualize. It can also be a useful property in data locality/clustering in DB indexes as something that has been a sorting need in projects I've worked on, across several different kinds of database.)

You all are right, you can't rely on that if you've got a stateless backend whose operations only strictly rely on if the key is greater than the last key they saw on whether or not they have new data to work on. If that is your use case, you need something that isn't ULID. That's not a lot of applications' use case. I think where we are most disagreeing is that you all seem to think that that is an eventual use case for all applications, and in my experience it is much more rare and you have other problems than what style of IDs you are using by the time you get there. For what it is worth, ULIDs have been a reasonable compromise for my applications' needs versus total orderable things like Thrift/Snowflake/et al. They make a great replacement for v4 UUIDs, which are still the default "golden child" among a certain class of dark matter Enterprise developer. That's fine if those aren't uses that you have, I'm saying those are uses that have a lot of applications.

> GUIDs do not have a consistent sort order

I didn't write that they "have a consistent sort order" (they don't), I wrote that they will "consistently sort in the same order".

In other words I was not saying they are generated in a particular order, but that once you have them they will sort consistently - that is, requesting the data multiple times will always return them in the same order. In context it wasn't a comment about their inherent nature in any way, more about what the stable sorting might make people believe about that nature.

My point is that whilst both GUIDs and ULIDs have stable ordering (as per the above), the sort order of GUIDs very clearly has no meaning whereas the sort order of ULIDs gives the impression that it might without further consideration. This is simple fact; the issue is the weight you put on that fact. To me it is dangerous, but feel free to differ.

I think you missed facts that I already referenced in my post: v1 GUIDs have timestamps and machine keys for sorting. The proposed v6 and v7 UUIDs have timestamps again. The "no meaning of GUID sort" today again seems purely an accident of v4 becoming the most common type of GUID/UUID not an actual intent by GUID/UUID developers especially given how important sorting was to v1 GUIDs.

(That's part what makes GUID/UUID sort so inconsistent today: there's still a major database used by lots of applications that sorts even v4 UUIDs by v1 machine key then v1 timestamp!) The sort of GUIDs have too many meanings in practice today, even when you assume the majority of GUIDs you see are v4 UUIDs.