| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kstrauser 1845 days ago
	That’s the UUID approach, but worse. According to the birthday problem[1], you’re 50% likely to get a collision in 65 bit numbers after about 5 billion insertions. That’s not an awful lot. Replace that with a 128-bit UUID and you’d have to insert 22,000,000,000,000,000,000 rows to get a 50% chance. That’s probably less likely than a cosmic ray flipping a random bit in RAM and corrupting the index that way. [1] https://en.wikipedia.org/wiki/Birthday_problem#Probability_t...

3 comments

jeff-davis 1845 days ago

The post qualified as <= 10,000,000 total records. For that number of records, there's about a chance of about 0.00001 that you get a collision, assuming good randomness.

link

kstrauser 1845 days ago

Sure, but stuff always grows, and the experiment gets run a bunch of times. Why not go with the built-in solution and then not have to worry about it?

link

jeff-davis 1845 days ago

I'm just answering the poster's question directly; but in the general case, I agree with you. The cognitive overhead of dealing with the various "what ifs" usually aren't worth the couple bytes or cycles that you could save.

link

oconnore 1844 days ago

Getting a collision with this approach doesn’t matter — the whole point is to loop if you do get a collision. The only issue is getting a long string of sequential collisions, which is highly unlikely.

link

kstrauser 1844 days ago

But now you’ve tried code complexity for a few bytes if storage. That’s just not worth it.

link

sa46 1844 days ago

8 extra bytes per row and per foreign key reference relative to an int64 can add up quickly especially if the row is small. I agree it’s not typically the right trade off but it’s not as absolute as you claim.

link

throwawayboise 1844 days ago

This is why we need 2TB drives now, when we used to get by with 2GB.

link

rswail 1844 days ago

Back in my day we measured things in Ks, not Ms or Gs or Ts.

Get off my lawn...

But seriously, UUIDs work, they don't need application code to avoid collisions. If you want something a bit more compact/shardable, use ULIDs.

link

deckard1 1844 days ago

for what it's worth, YouTube still uses 11 character base64 strings for their video ids, which are assumed to be 64-bit ints. They also allow unlisted videos, which people usually take to mean "semi-private".

It's an interesting tradeoff. The UX of the smaller YouTube video id links is probably of some benefit to them. Plus they have private videos for when you really don't want your video to be viewed, with unlisted being the middle ground of easy sharing but also keeping it exclusive.

link

kstrauser 1844 days ago

Sure, and it makes sense there: write a service that returns unique 64 bit ints and encapsulate the complexity inside that one location. That’s easier than making every `insert` in your app code have to do a `while not unique` loop.

link