Hacker News new | ask | show | jobs
by laurent123456 1876 days ago
I think the issue with a lib like this is that people might use it, thinking they don't need the IDs to be secure... until they need to be. But by then plenty would have been generated, and a lot of code would rely on this sortable property.

In fact, I don't see the point of a library like this, it's trying to encode two pieces of information into one string. Why is that? Why not encode geolocation or IP while they're at it. If some data is important, like the order, then a separate database column can easily be used and that would make for a much more durable and reliable solution.

4 comments

Secure? These fuuids are supposed to replace regular uuids, which aren't specified to be secure either. UUIDs are supposed to be quasi-unique (very low probability of collisions), not secure. They are, in my humble opinion, the wrong tool if you want secure.

>Why not encode geolocation or IP while they're at it.

Version 1 (and 2) of UUIDs encoded the MAC-Address + ns timestamp. So there you go.

Version 3 and 5 just hash a namespace string and a name string with md5 and sha1 respectively (same inputs generate same output)

Version 4, which these days is used by almost everything, just uses whatever pseudo-random numbers the implementation chose to use. If you can figure out what PRNG (pseudo-random number generator) was used (if any) and the internal state of that PRNG, then you may be able to guess/forge UUIDs. The spec does not mandate using a cryptographically-secure PRNG, although a lot of implementations out there today will probably use one.

Anyway, thinking UUIDs would be secure is wishful thinking, unless you did the legwork to make sure you control the generator, and the generator you use is using Version 4 with a cryptographically-secure PRNG, and you can guarantee that that will not change in the future; then you get 122 bits of secure random numbers (+ 6 static bits for version and variant).

By comparison, this implementation of "fuuids" gives you 96 bits of secure-enough random numbers or 64 bits in the nanosecond case (it uses os.urandom(12) and os.urandom(8), compared to os.urandom(16) that the standard lib in python uses)

Version 4 would still be more secure in the sense that it doesn’t leak any extra info and as you note it can be generated using a a cryptographically-secure PRNG.

With sortable IDs, you could potentially know when some item was generated, which may or may not be useful info, but why take the risk? I think it makes sense to minimise the data that the app or service exposes. With a regular uuid you can control whether you expose or not the creation time, while with sortable ids you can’t.

Sortable IDs have pleasant properties on insertion in traditional btree indexes as all the new values are on one edge of the tree. Truly random IDs end up with random I/O on the index tree.

With a database like postgres with full page writes enabled, it can blow out quite quickly at scale.

Though something to point out that this project misses is that UUID sorting is its whole own weird can of worms. I just went through this exercise trying to get ULID to GUID conversion/round-tripping that matched MS SQL Server's GUID sort order in hopes for the benefits of those small, pleasant properties.

In short, because SQL Server's sort order is based on GUID/UUID v1, the "group" that matters most significantly for sort order (in SQL Server, but other GUID tools are different) is the tail 6 bytes (not the head bytes as these FUUIDs are using which would allow them to sort in unix sort in GUID form). (As the tail 6 bytes are the "machine ID" in v1 UUIDs.)

For anyone curious how deep (and which endian) the rabbit hole goes in GUID sorting, I kept referring to Raymond Chen's somewhat comprehensive description of GUID sort orders across Microsoft products in my efforts: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=10...

On the other hand, doesn’t that mean that a workload that inserts new sortable uuids would experience lots of contention, while one that inserts random uuids would better spread load? (This is probably more significant in a distributed database sharded by uuid, where this means all the write load is going to one shard at a time.)
It’s worse with clustered indexes — all of the data experiences this issue, not just the index.
At least in Postgres you have the advantage/disadvantage that real-time clustered tables aren't a feature on offer - so that re-clustering cost is eaten on a periodic basis whenever you invoke a re-cluster operation on the table in question.

Additionally re-reading things closely I believe this package wouldn't benefit databases at all since the "sortable" form appears to be the common string representation while postgres will internally store UUIDs (at least in a UUID column) as their binary values - leading to indexing still being out of order unless you specify a ::TEXT casting.

This seems to be one of those things that you can never convince people of. Everyone thinks that their database is going to have so much data in it that having a sortable ID is important for "performance", and that sorting by another field is silly because of "performance".

ID's are there to identify the row. You can sort it by whatever you like, preferably something that makes sense from a user perspective so the users don't get handed lists of stuff in what seems to them to be a stupid order.

I agree, to an extent, with your comment. But having sortable/chronological IDs on a single column is a really nice feature to have.

Obviously, if you rely on the IDs to be unguessable/brute-forceable as part of your access control, then it is certainly bad to use such IDs. But I have not yet seen a case where I need both sortable and secure IDs. Being collision-free is what we want in a database context, and if it can hold some more information for the order, all the better.