Hacker News new | ask | show | jobs
by koolba 1876 days ago
Sortable IDs have pleasant properties on insertion in traditional btree indexes as all the new values are on one edge of the tree. Truly random IDs end up with random I/O on the index tree.

With a database like postgres with full page writes enabled, it can blow out quite quickly at scale.

3 comments

Though something to point out that this project misses is that UUID sorting is its whole own weird can of worms. I just went through this exercise trying to get ULID to GUID conversion/round-tripping that matched MS SQL Server's GUID sort order in hopes for the benefits of those small, pleasant properties.

In short, because SQL Server's sort order is based on GUID/UUID v1, the "group" that matters most significantly for sort order (in SQL Server, but other GUID tools are different) is the tail 6 bytes (not the head bytes as these FUUIDs are using which would allow them to sort in unix sort in GUID form). (As the tail 6 bytes are the "machine ID" in v1 UUIDs.)

For anyone curious how deep (and which endian) the rabbit hole goes in GUID sorting, I kept referring to Raymond Chen's somewhat comprehensive description of GUID sort orders across Microsoft products in my efforts: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=10...

On the other hand, doesn’t that mean that a workload that inserts new sortable uuids would experience lots of contention, while one that inserts random uuids would better spread load? (This is probably more significant in a distributed database sharded by uuid, where this means all the write load is going to one shard at a time.)
It’s worse with clustered indexes — all of the data experiences this issue, not just the index.
At least in Postgres you have the advantage/disadvantage that real-time clustered tables aren't a feature on offer - so that re-clustering cost is eaten on a periodic basis whenever you invoke a re-cluster operation on the table in question.

Additionally re-reading things closely I believe this package wouldn't benefit databases at all since the "sortable" form appears to be the common string representation while postgres will internally store UUIDs (at least in a UUID column) as their binary values - leading to indexing still being out of order unless you specify a ::TEXT casting.