| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by henryfjordan 2088 days ago
	The sort-ability of ULIDs is of questionable usefulness. It looks like the generator relies on the clock of the machine generating the ID. If you generate ULIDs across machines, there is no guarantee that there isn't a time-drift. So to rely on that, you'd have to have a single oracle generating the ids. Why not just use an auto-incrementing ID at that point?

2 comments

scrollaway 2088 days ago

> Why not just use an auto-incrementing ID at that point?

Because sometimes (and by sometimes I mean "surprisingly often"), you don't care about exact sortability but simply want roughly-correct ordering.

Examples for most social media sites:

"Give me the latest 100 tweets/videos/posts for this user" => Milliseconds differences won't matter, because users don't tweet/post videos that often.

"Give me the latest 1000 tweets/videos/posts for this search" => Even a significant drift won't matter, because you don't care if things aren't exactly in the correct order, you just care about showing recent stuff.

And at that scale (even long before that scale), having a single oracle for auto-incrementing IDs is a hassle. So this is a nice solution any time you need globally unique IDs, need to support sharding, and your default sort is time-based (or whatever-based if there's another piece of info you want to put in that timestamp portion of the ID, as long as "drift" is a non-issue).

BTW, not just theoretical: I believe this exact reasoning is why twitter came up with time-sortable snowflake IDs.

link

henryfjordan 2088 days ago

That's fair, if you care about close-enough time-ordering but don't care about strict global causality then ULIDs would be useful. I suppose if you have a partition-able workload (say users posts go into Kafka keyed on userId) then you can use ULIDs to have guaranteed order within a partition, which is probably enough for many use-cases.

link

Lazare 2088 days ago

One reason is database indexes. If you shove truly random data into a standard B-tree (or whatever), performance tends to suck. If the start of the key lets you roughly sort by time, performance improves.

It's not super uncommon for people to use a normal UUID (usually v1, NOT v4; you need a timestamp), then restructure the fields so the timestamp is in front on save, then flip it back on load; this gives you a "proper" UUID, but (in theory) gives you better performance. See, eg, here: https://www.percona.com/blog/2014/12/19/store-uuid-optimized...

Now, ULID mades an odd attempt at making keys strictly sortable by time, which 1) doesn't work and 2) is pointless. :-) In that case, you really would be better off with an auto-incrementing idea. But while their implementation is questionable, the idea isn't absurd.

link