Hacker News new | ask | show | jobs
by tsenart 3482 days ago
> What is the use of 48 bites of time? It reduces the overall entropy, for what? If time is important then why not make the id literally time (i.e. UnixNano), if it isn't they why not make all bits rand?

For some designs it's useful to have identifiers have other properties than uniqueness. In this case, this property is relative lexicographical (and binary) order based on time so that you can leverage the order between the things the identifiers identify without looking at the things. The entropy is there to satisfy the uniqueness property (with some acceptable degree of collision, application dependent). The time is there to satisfy the ordering property.

1 comments

Yes very good point. However, as @danbruc rightly points out this raises all sorts of other concerns. A user of these IDs my not realize that the reliability of the ordering can be substantially reduced depending on where the IDs are generated.

Some applications may be able to tolerate inconsistencies in ordering, others may not. Are IDs being generated on multiple machines? Are they in sync? What happens if the system clock is adjusted, or a container/VM is restarted on different hardware?

This design implies that these IDs are being generated in different locations, but this usage leads to the least reliable time. How many bits of approximate time does one really need? Not 48 surly.

On the other hand if you generate the IDs in the most reliable model, a single host with persistent storage to prevent regression, you've basically made an unnecessarily complicated vector clock. A simple incremental counter would work at least as well, and be far simpler.

> A user of these IDs my not realize that the reliability of the ordering can be substantially reduced depending on where the IDs are generated.

That can only be addressed with improved documentation and shared understanding of the subtleties and pitfalls of distributed time synchronisation.

> Some applications may be able to tolerate inconsistencies in ordering, others may not.

Indeed. Proper thought must be but into this sort of thing. ULIDs aren't an exception nor a silver bullet.

> How many bits of approximate time does one really need?

Entirely application dependent.

Agreed.

The point being these distinctions lead to the conclusion that this identifier isn't 'generally' useful, and even under optimal conditions it's utility is questionable. For example extra precision for an approximate value is not application dependent at all. The low order bits of the time component have no actionable meaning, though they imply sort order. That's the kind of subtile error in reasoning that is really easy to make here. I think there are too many land mines hidden here to make this useful.

You are letting the perfect be the enemy of the good. Perfect general applicability to all problem domains is not a requirement of utility. Engineering is tradeoff analysis.