Hacker News new | ask | show | jobs
by kstrauser 1845 days ago
That’s the UUID approach, but worse. According to the birthday problem[1], you’re 50% likely to get a collision in 65 bit numbers after about 5 billion insertions. That’s not an awful lot. Replace that with a 128-bit UUID and you’d have to insert 22,000,000,000,000,000,000 rows to get a 50% chance. That’s probably less likely than a cosmic ray flipping a random bit in RAM and corrupting the index that way.

[1] https://en.wikipedia.org/wiki/Birthday_problem#Probability_t...

3 comments

The post qualified as <= 10,000,000 total records. For that number of records, there's about a chance of about 0.00001 that you get a collision, assuming good randomness.
Sure, but stuff always grows, and the experiment gets run a bunch of times. Why not go with the built-in solution and then not have to worry about it?
I'm just answering the poster's question directly; but in the general case, I agree with you. The cognitive overhead of dealing with the various "what ifs" usually aren't worth the couple bytes or cycles that you could save.
Getting a collision with this approach doesn’t matter — the whole point is to loop if you do get a collision. The only issue is getting a long string of sequential collisions, which is highly unlikely.
But now you’ve tried code complexity for a few bytes if storage. That’s just not worth it.
8 extra bytes per row and per foreign key reference relative to an int64 can add up quickly especially if the row is small. I agree it’s not typically the right trade off but it’s not as absolute as you claim.
This is why we need 2TB drives now, when we used to get by with 2GB.
Back in my day we measured things in Ks, not Ms or Gs or Ts.

Get off my lawn...

But seriously, UUIDs work, they don't need application code to avoid collisions. If you want something a bit more compact/shardable, use ULIDs.

for what it's worth, YouTube still uses 11 character base64 strings for their video ids, which are assumed to be 64-bit ints. They also allow unlisted videos, which people usually take to mean "semi-private".

It's an interesting tradeoff. The UX of the smaller YouTube video id links is probably of some benefit to them. Plus they have private videos for when you really don't want your video to be viewed, with unlisted being the middle ground of easy sharing but also keeping it exclusive.

Sure, and it makes sense there: write a service that returns unique 64 bit ints and encapsulate the complexity inside that one location. That’s easier than making every `insert` in your app code have to do a `while not unique` loop.