| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by firebacon 2950 days ago

The difference here seems to be the exact definition of "unique".

A UUID has a finite length, so if we generate N new UUIDs in a finite time-interval it seems clear that - in theory - we must have collisions for large values of N (or even infinite N), regardless of how clever we are in seeding our random number generator. But I don't think this can be fixed without using a variable length value or refusing to mint new UUIDs once the available bits are used up.

Of course in practice that should not really happen; at least if we only run on hardware from vendors where we can assume that every MAC address will be unique, the only way to actually get a colission - in practice - would be by generating more than 2^B UUIDs on the same machine within a fairly short timeframe and fairly large B.

EDIT: In v1 of the algorithm it seems that B=14 and the "short timeframe" is the smallest resolution increment your system clock supports. That is if we assume the "uniquifying" clock sequence is produced by incrementing a counter. So we can say that for practical purposes, a collision is impossible on a modern system unless we have invalid MAC assignments to our hardware, unreasonable transaction rate, or an incorrect implementation of the UUID algorithm.

Am I missing something?

1 comments

zzzcpan 2950 days ago

> by generating more than 2^B UUIDs on the same machine within a fairly short timeframe and fairly large B

Which can be made impossible.

IDs are part of the finite system and don't exist on their own. And the system can perform a finite number of operations both concurrently and in a finite time-interval with some amount of uniqueness distinguishing nodes and other useful properties. Making it possible to always find an id generating algorithm for this system that can never have collisions.

link

firebacon 2950 days ago

I agree, if we are allowed to make the UUID arbitrarily large (and not be limited to the 128 bits from the "official" UUID algorithm), it should always be possible to set B to a large enough value so that our scheme assigns a unique and finite-sized UUID to every possible state of the system, since all the system's parameters (number of servers, time, etc) should be discrete and finite. I.e. basically make B large enough that 2^B becomes greater than the number of discrete timesteps in our experiment. That's an interesting observation.

link

zzzcpan 2950 days ago

You can get away with a lot less than 128 bits for most systems. It's only that big because it tries to be universal.

link