Hacker News new | ask | show | jobs
by Alupis 1312 days ago
There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance. They are, after all, not sequential and using them to obscure things like number of users (using a UUID in place of an incrementing number) seems like a natural fit.

Given how easy it is to generate a UUID in most languages, and given the low likelihood of a collision within a system - it wouldn't be a huge leap to think UUID's could replace homebrewed random string generators for things like password reset tokens, etc.

2 comments

> There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance.

That's near enough to true for anyone not operating at "web scale".

FAANG/BAT engineers need to care. My systems with 10s or 100s of thousands of users (or, you know, a few thousand users tops) are without doubt going to be re-written (probably several times) well before I have to worry about having so many UUIDs in the wild that this becomes a reasonable thing to worry about.

For me, at the scale of systems I run (or will conceivably run in the medium term future), I think the simplicity/understandability of code that uses native language UUID functions is "the right thing". Whoever does the next big rewrite to support a few million MAU will be thankful they don't have to work out WTF I was thinking when I decided to roll my own random access tokens.

I doubt FAANG engineers need to care either. Ignoring that the author imagines 8k IoT devices per living human for one service, 2^64 requests per second is an absurd number to use. Assuming one server can do 10M RPS, you'd need 1.8 trillion servers to handle that load. You'd also need over 2 billion Tb/s of bandwidth to receive just the UUIDs with no overhead.

It doesn't matter what computing resources your attacker has; the limit is how much your infrastructure can handle, and the author casually overestimates that by about 10 orders of magnitude. So replace 35 minutes with 350 billion minutes, or about 660,000 years.

Thanks for this. I thought I must be missing something because this seems like such an obvious point.

I find it hard to believe that there is a problem with a (cryptographically random) 122 bit session key considering that a brute force attack on it will result in a DDoS, which is obviously self limiting.

Lots of people here are saying “never use a uuid for a session key”, but I don’t understand this. What’s the accepted entropy for a session key?

I think the even more absurd rec is to use 160 bits as a "sweet spot"? Why? Who said that? Which real world scenarios? Why not 159 or 161...

Then you realize the author is just talking out their rear end with no thought...

"Yes I often find my cracking buddies with their super computers just give up hacking my online user service when I bumped my user token length from 159 to 160 length", said nobody, ever.

> "Yes I often find my cracking buddies with their super computers just give up hacking my online user service when I bumped my user token length from 159 to 160 length", said nobody, ever.

Reminds me of this sketch: https://youtu.be/IHfiMoJUDVQ

Even they shouldn't need to be concerned much with collisions. Wikipedia suggests[0] "generating 1 billion UUIDs per second for about 85 years". Is it possible? Sure. Is it likely? Not really.

[0]: https://en.wikipedia.org/wiki/Universally_unique_identifier#...

I guess from the article it's not just collisions, but the (significantly more likely) problem of guessing a UUID that's valid (out of all the issued tokens).

But yeah, even that is very very low risk. The article had to make some outrageously pessimistic assumptions to get it's "38 minutes!" number. Issuing a million tokens a second with two year validity, and getting attacked with the entire hash rate of the bitcoin mining community. And having both enough backend capacity to handle all those requests while at the same time having no observability or rate limiting to mitigate a brute force attack.

> I guess from the article it's not just collisions, but the (significantly more likely) problem of guessing a UUID that's valid (out of all the issued tokens).

Assuming random UUIDs:

If you're counting all the UUIDs anyone makes, then valid<->attacker matches are a subset of all possible collisions and therefore less likely.

If your baseline is only the collisions between valid UUIDs, then whether an attacker is more or less likely to collide depends on whether they're generating UUIDs at least half as fast as the system they're attacking.

> That's near enough to true for anyone not operating at "web scale". FAANG/BAT engineers need to care.

I’d argue even then it’s really not much a concern. You’d need to generate 1 billion UUID v4’s per second for over 75 years to have a 50% chance of there being a single collision.

You can generate sequential UUIDs, IIRC, that’s the best way to store them in a db and still have good partitioning/indexing. I don’t use UUIDs often, but I vaguely remember researching this problem space at some point.
I think most languages let you chose which version of UUID you want - with most defaulting to the random version (I think 4?) by default.

There are other versions that are sequential/time-based though, but using these could open the door to de-obfuscating whatever data you wanted to protect via UUID's in the first place (like how many sales orders you receive per hour, etc).

I don’t think uuids are designed for obfuscation, though they certainly help with that as a side effect. I could be wrong though, I’ve never looked into it.
They (randomized type 4 UUID's) obfuscate as a side effect because they are much more difficult to guess due to their randomness. As the article points out though, they are not impossible to guess... but it will come down to your risk tolerance and what the UUID's are "protecting".

People like to reach for UUID's when obfuscation is needed because inventing your own duplicate-aware random string algorithm isn't what most folks want to spend their time thinking about. Plus, these days, many databases come with UUID-aware data types that make using UUID's fairly straight forward.

UUIDs are a vast improvement over integers for preventing simple attacks like +/-ing the id and seeing what happens.
But then you're back to collisions, and you may as well be using longs.
I think v7 uses microseconds since epoch + random data. The odds of a collision should be practically 0, or more likely to find a sha256 collision.
> more likely to find a sha256 collision.

This is obviously, and egregiously, false.

I don't know. You'd need quite a number of threads + machines generating uuids in the exact same microsecond to get an opportunity for a collision. It doesn't seem obviously false.
I didn't say a collision is easy, I said it's obviously false it's harder than colliding a sha256, a space roughly 95780971304118053647396689196894323976171195136475136 times larger.