Hacker News new | ask | show | jobs
by firebacon 2950 days ago
The difference here seems to be the exact definition of "unique".

A UUID has a finite length, so if we generate N new UUIDs in a finite time-interval it seems clear that - in theory - we must have collisions for large values of N (or even infinite N), regardless of how clever we are in seeding our random number generator. But I don't think this can be fixed without using a variable length value or refusing to mint new UUIDs once the available bits are used up.

Of course in practice that should not really happen; at least if we only run on hardware from vendors where we can assume that every MAC address will be unique, the only way to actually get a colission - in practice - would be by generating more than 2^B UUIDs on the same machine within a fairly short timeframe and fairly large B.

EDIT: In v1 of the algorithm it seems that B=14 and the "short timeframe" is the smallest resolution increment your system clock supports. That is if we assume the "uniquifying" clock sequence is produced by incrementing a counter. So we can say that for practical purposes, a collision is impossible on a modern system unless we have invalid MAC assignments to our hardware, unreasonable transaction rate, or an incorrect implementation of the UUID algorithm.

Am I missing something?

1 comments

> by generating more than 2^B UUIDs on the same machine within a fairly short timeframe and fairly large B

Which can be made impossible.

IDs are part of the finite system and don't exist on their own. And the system can perform a finite number of operations both concurrently and in a finite time-interval with some amount of uniqueness distinguishing nodes and other useful properties. Making it possible to always find an id generating algorithm for this system that can never have collisions.

I agree, if we are allowed to make the UUID arbitrarily large (and not be limited to the 128 bits from the "official" UUID algorithm), it should always be possible to set B to a large enough value so that our scheme assigns a unique and finite-sized UUID to every possible state of the system, since all the system's parameters (number of servers, time, etc) should be discrete and finite. I.e. basically make B large enough that 2^B becomes greater than the number of discrete timesteps in our experiment. That's an interesting observation.
You can get away with a lot less than 128 bits for most systems. It's only that big because it tries to be universal.