| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mantrax5 4426 days ago

> "Including a machine identifier and timestamp in the generation of your UUID guarantees uniqueness, so long as each machine is careful not to generate two UUIDs 'simultaneously'."

That's a pretty big if there, especially considering how unreliable timers are, and the fact they can jump back and forth.

- Wall clock timers will jump forward or back and repeat time after NTP adjustment, or anyone else who adjusts the clock.

- Internal "elapsed time" timers, based on say the CPU clock may jump around or repeat when the source core changes on multicore machines, or you have processes sourcing their CPU clock from the core they're bound on.

- A node may be moved from machine to machine (and their clocks are not necessarily perfectly synched).

This is a hard lesson that distributed system designers learn over and over again: don't rely on random and don't rely on timers for identity. Good old incrementing counters have none of those issues and are absolutely trivial to implement. They never fail, never drift, never repeat, by definition. As I said a few posts above, the typical structure of a "unique id" based on a counter has three counters:

    1. nodename (or machine name if you will)

    2. namespace (you can have one per thread to avoid contention issues)

    3. local (monotonically increases within the namespace).

Let's say your node id is 256, and it runs on 4 threads, one per core + 1 thread for the scheduler. You need five namespaces:

256.0-4.* (where * is increasing monotonically 0, 1, 2, 3, 4...)

And once you have to reboot the node, you obtain five new namespaces:

256.5-9.*

And the * counters start over from zero on each. As for how big should each segment be, that depends on what you want to do with the nodes I suppose. But the 128-bit UUID size should never be some kind of guide regarding size; maybe you need three shorts, maybe you need three longs, maybe short.medium.long etc., it'll depend on the use case.