|
|
|
|
|
by jandrewrogers
1877 days ago
|
|
The collision risk depends on the application and use case. In sufficiently large real systems, a probabilistic pseudo-UUID with only 96 bits of entropy has a collision probability that is very small but not so small that you can treat it as effectively zero if avoiding collisions is critical. I’ve seen multiple systems that generate unique identifiers at rates > 10^9 per second. The 128-bit size has a lot of advantages, but probabilistic identifiers are expressly disallowed for some applications and contracts because of the collision tail risk. Making the identifier larger than 128-bits has other significant costs so that generally is not an option either. That aside, generating probabilistic UUIDs at the rates where this is an issue is a bad idea regardless for performance reasons. |
|
One popular UUID library got a bug report stating: "We are generating about 1M UUID4 a day, and we are getting several hundred collisions a day". And so they were; turned out to be a bug/weird interaction between the OpenSSL library they were using for randomness and forking. (Details here, although it was all fixed years ago of course: https://github.com/ramsey/uuid/issues/80)
On paper, you should never, ever, ever see a collision when generating a mere million v4 UUIDs a day, much less hundreds of collisions. But that doesn't mean it can't happen!
This is also an interesting bit of analysis; comes from a company that processed a lot of UUIDs generated in browsers, checked, and discovered about 5 collisions per million UUIDs. Again, not what you'd naively expect! (Turned out to be mostly driven by misbehaving crawlers.) https://medium.com/teads-engineering/generating-uuids-at-sca...