| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cetra3 1310 days ago
	If you're using them for unguessable random strings then yeah, they're not ideal. If you're using them for providing a unique id in a distributed system, with very little chance of collision & fitting them in a db column, then they are great.

9 comments

ilyt 1310 days ago

Pretty much, my first reaction was "people use UUIDs for session tokens ? why? ?

Seems like author made some bad choices in previous systems and now just figured out why tbh.

link

jmull 1310 days ago

I’m not sure it’s bad to use a random UUID (v4) generated with a random number generator designed for cryptography for a validated session key.

A guess means making a request to your server. You won’t be concerned with ~2^64 guesses per second.

I’m not suggesting anyone do it, if you have a choice. (Especially consider you’ll probably have to go through the trouble to justify it to people who read articles like this but don’t understand the math.) But if you have an existing system, consider whether you can let it stand.

link

ilyt 1309 days ago

Well, existing system (that for whatever reason can't do CSPRNG-> base64) could always concat 2 UUIDs

link

epicureanideal 1310 days ago

Depending on the UUID algorithm, some are cryptographically sufficient true random, then it would make sense..

link

ksidudwbw 1310 days ago

what if you sha it?

link

nine_k 1310 days ago

Adding a crypto hash allows to check that the hashed value was not changed, because finding another value with the same hash is hard, by definition of a crypto hash.

But here the problem is not forging an ID, it's guessing an ID, and hashing does not widen the search space, does not increase randomness.

link

dspillett 1310 days ago

> Adding a crypto hash

I think the poster you replied to was meaning using the hash output as the token, not that you would maintain the original token and a salted hash for verification.

If they are thinking SHA(GenerateUUID()) would have better entropy then they are incorrect even though all SHA variants output more than the 128-bits in the source UUID. I assume such misunderstanding comes from the fact that some PRNGs are based upon repeated application of cryptographically assured hash functions against the seed data.

Using some unreversible transform would solve the issue of potentially leaking information in the UUIDs, but if that is an issue then instead use a UUID variant based on purely random data (v4?) as that would be more efficient and not result in value that is longer but contains no extra entropy.

link

WirelessGigabit 1310 days ago

That actually reduces the usefulness as you're hashing the data into a smaller length.

link

unlikelymordant 1310 days ago

It seems uuids are 128 bit, while sha is 160 bit. There is also sha256 and sha512 for longer hashed. So there shouldnt be any worries about the hash being shorter.

link

jchw 1310 days ago

Rereading I am guessing you're merely pointing out that the comment regarding shortening the length is untrue. If you already understand the entropy issue here, please treat my "you"s as royal you's.

You have a 128 bit value. That's 128 binary digits. Each digit can be zero or one. That means you have 2^128 possible distinct values. (Ignoring the fixed bits in UUIDs since it's not important for sake of this argument.)

Now you use a one-way cryptographic hash on top, like sha256. This will return a specific hash for any given input. It is always the same for a specific given input, and it is nearly always distinct. The output that a hash has may have more bits, but the number of distinct values can't increase; it can only ever decrease. That's because you could only ever give it 2^128 different values. How could it ever return more outputs if each input corresponds to one output?

To make it more clear, let's say you have a database where you want to store a customer's zip code so you can use it as some kind of validation later on to ensure it matches, but you don't want to store it in plaintext, so you hash it. The hash is 160 bits. Secure, right? Wrong. There are less than 50,000 zip codes. It would be trivial to calculate the hash of every single one and use it as a simple hashmaps from hashed value to plaintext.

You may be thinking this is impractical for an input domain as large as 2^128, but realistically it only adds a slight roadblock. Knowing the only valid values will be hashed UUIDs, instead of picking 160 random bits, you'd be much better off picking a random UUID, hashing it, and trying that for each attempt.

link

markatto 1310 days ago

Yes, some hashes might not meaningfully hurt it, but they won’t add any entropy, which is the real problem.

link

ak39 1310 days ago

Not being snarky: what's the risk of using UUIDs for session tokens if they are created by the server/db and are always verified by server (db) (for authorisation etc)?

link

jve 1310 days ago

Well, V4 UUIDS per wiki are pretty random, but your generated UUID could actually use your MAC address and current time to be globally unique. So, less entropy. Just use them as a (globally) unique thing but not as a secret.

link

whizzter 1310 days ago

Basically, know your UUID generator type. V1, V2, V6 and V7 are mac/time dependant and more useful for f.ex. DB-keys whilst V4 is more useful for things that should actually be secret.

link

Gordonjcp 1310 days ago

So there's nothing actually wrong with UUIDs as secrets, if you know what you're doing and how to mitigate the risks?

So pretty much the same as every other damn thing in software that gets an "X Considered Harmful" article? :-D

link

mort96 1310 days ago

I would trust a reputable cryptographic random number generator library to really care about generating truly unguessable, high entropy cryptography-grade random numbers. I would trust a reputable UUID library to generate a UUIDv4 which is random enough to not produce a collision. I would not trust a reputable UUID library to generate truly unguessable, high entropy cryptography-grade UUIDv4s.

link

dahfizz 1310 days ago

Not really. The articles point is that even a v4 UUID (the random one) doesn't have enough randomness as other options, and it has a much less compact representation.

UUIDs are not designed to be secrets, so they are a poor choice. They'll probably work, but there are better options.

link

scott_w 1310 days ago

If you know what you're doing and mitigating the risks, you don't waste your time trying to use UUIDs for secrets. Therefore people using UUIDs for secrets, by definition, don't know what they're doing and certainly aren't mitigating the risks.

link

hsuduebc2 1310 days ago

From my experience even from bigger companies it is sometimes common practice.

link

corytheboyd 1310 days ago

Yeah I don't really get the point of this article, if you need random values of a specific size don't use uuid, it's literally specified to be one exact length and format.

link

ejb999 1310 days ago

>>Yeah I don't really get the point of this article,

To get clicks?

link

corytheboyd 1310 days ago

You're not wrong lol

link

scott_w 1310 days ago

The number of comments saying "using UUIDs for secrets isn't that bad" suggests this article needs to be written...

link

tejtm 1310 days ago

one exact length and five "versions" of the format (so far)

https://en.wikipedia.org/wiki/Universally_unique_identifier#...

link

adileo 1310 days ago

I made a comparison list with the most known uuids out there, a couple of days ago, it was quite fun discovering all the different kinds of uid and their pros/cons.

https://adileo.github.io/awesome-identifiers/

link

bmn__ 1309 days ago

https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...

link

JimDabell 1310 days ago

KSUIDs are fairly popular and missing from your list:

https://github.com/segmentio/ksuid

link

8n4vidtmkvmk 1310 days ago

what's the resolution on those? 32 bits, 100 years.. that seconds right? doesn't sound excellent for time ordering. 100 years also seems a little short but at least I'll be dead

link

sekh60 1310 days ago

Don't look at it as being your problem in 100 years, but as helping employment in 100 years and helping the economy ;)

link

djbusby 1310 days ago

ULID example should be in uppercase.

Love this chart tho.

link

throwawaymaths 1310 days ago

Also most well-designed systems only use the UUID as the representation format and use raw bits in performance-critical parts.

link

lolinder 1310 days ago

The raw bits are the UUID, the hex string is just a human-readable representation that also plays nicely with JSON.

link

throwawaymaths 1310 days ago

Tell that to Django (well 5 years ago anyways iirc, don't know what it does now). Pretty sure it used to store uuids as strings columns in your sql.

link

mort96 1310 days ago

I suppose Django wouldn't consider the speed gains of using raw integers in the database worth the hassle of dealing with binary data when you have to manually deal with the database somehow. I usually use string columns for UUIDs myself for the same reason.

It's also not given that it'll be a performance benefit, you probably receive UUIDs as strings from some client and probably want to return UUIDs as strings to the client, and that conversion isn't free.

link

lolinder 1310 days ago

Yep, looks like it does the right thing in PostgreSQL but not anywhere else [0].

https://docs.djangoproject.com/en/4.1/ref/models/fields/#uui...

link

throwawaymaths 1310 days ago

I feel like it did strings in postgres too, not too long ago and I had a <brain explode> moment when I worked on a codebase and had to figure out why queries were terrible

link

halhen 1310 days ago

Or to PowerBI, which will any UUID to a string even in joins. That cast + string comparisons + killing of indexes is not conducive to performant queries...

link

fsloth 1310 days ago

It’s a 128 bit integer - the serialization format does not change the fact.

link

_3u10 1310 days ago

Use uint128_t instead.

link

BerislavLopac 1310 days ago

It is also highly recommended that you include a check digit into it, to minimize the chance of a collision. I've used https://arthurdejong.org/python-stdnum for that purpose.

link

sokoloff 1310 days ago

I don't see how a check digit minimizes the chance of collision. (Here, I'm assuming that a check digit is calculated from the other digits. What am I thinking about incorrectly?)

link

georgemcbay 1310 days ago

Looking at the docs for the library linked, it appears to be a Verhoeff algorithm check digit... so yeah, you're correct.

This is effectively a simplistic stand-in for a CRC type system -- useful to detect if the data has been corrupted, but not useful to avoid collisions.

link

TedDoesntTalk 1310 days ago

And if someone is worried about UUID collisions, they need to rethink their priorities in life.

link

BerislavLopac 1309 days ago

You are correct, this should teach me not to write comments when I'm too tired. :/

The check digit wouldn't really help with collisions, since if the strings are the same the digit will be too. They are primarily useful when we need to ensure correctness on human input.

link

Alupis 1310 days ago

There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance. They are, after all, not sequential and using them to obscure things like number of users (using a UUID in place of an incrementing number) seems like a natural fit.

Given how easy it is to generate a UUID in most languages, and given the low likelihood of a collision within a system - it wouldn't be a huge leap to think UUID's could replace homebrewed random string generators for things like password reset tokens, etc.

link

bigiain 1310 days ago

> There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance.

That's near enough to true for anyone not operating at "web scale".

FAANG/BAT engineers need to care. My systems with 10s or 100s of thousands of users (or, you know, a few thousand users tops) are without doubt going to be re-written (probably several times) well before I have to worry about having so many UUIDs in the wild that this becomes a reasonable thing to worry about.

For me, at the scale of systems I run (or will conceivably run in the medium term future), I think the simplicity/understandability of code that uses native language UUID functions is "the right thing". Whoever does the next big rewrite to support a few million MAU will be thankful they don't have to work out WTF I was thinking when I decided to roll my own random access tokens.

link

ndriscoll 1310 days ago

I doubt FAANG engineers need to care either. Ignoring that the author imagines 8k IoT devices per living human for one service, 2^64 requests per second is an absurd number to use. Assuming one server can do 10M RPS, you'd need 1.8 trillion servers to handle that load. You'd also need over 2 billion Tb/s of bandwidth to receive just the UUIDs with no overhead.

It doesn't matter what computing resources your attacker has; the limit is how much your infrastructure can handle, and the author casually overestimates that by about 10 orders of magnitude. So replace 35 minutes with 350 billion minutes, or about 660,000 years.

link

doctor_eval 1310 days ago

Thanks for this. I thought I must be missing something because this seems like such an obvious point.

I find it hard to believe that there is a problem with a (cryptographically random) 122 bit session key considering that a brute force attack on it will result in a DDoS, which is obviously self limiting.

Lots of people here are saying “never use a uuid for a session key”, but I don’t understand this. What’s the accepted entropy for a session key?

link

smaudet 1310 days ago

I think the even more absurd rec is to use 160 bits as a "sweet spot"? Why? Who said that? Which real world scenarios? Why not 159 or 161...

Then you realize the author is just talking out their rear end with no thought...

"Yes I often find my cracking buddies with their super computers just give up hacking my online user service when I bumped my user token length from 159 to 160 length", said nobody, ever.

link

sidpatil 1310 days ago

> "Yes I often find my cracking buddies with their super computers just give up hacking my online user service when I bumped my user token length from 159 to 160 length", said nobody, ever.

Reminds me of this sketch: https://youtu.be/IHfiMoJUDVQ

link

dpcx 1310 days ago

Even they shouldn't need to be concerned much with collisions. Wikipedia suggests[0] "generating 1 billion UUIDs per second for about 85 years". Is it possible? Sure. Is it likely? Not really.

[0]: https://en.wikipedia.org/wiki/Universally_unique_identifier#...

link

bigiain 1310 days ago

I guess from the article it's not just collisions, but the (significantly more likely) problem of guessing a UUID that's valid (out of all the issued tokens).

But yeah, even that is very very low risk. The article had to make some outrageously pessimistic assumptions to get it's "38 minutes!" number. Issuing a million tokens a second with two year validity, and getting attacked with the entire hash rate of the bitcoin mining community. And having both enough backend capacity to handle all those requests while at the same time having no observability or rate limiting to mitigate a brute force attack.

link

Dylan16807 1310 days ago

> I guess from the article it's not just collisions, but the (significantly more likely) problem of guessing a UUID that's valid (out of all the issued tokens).

Assuming random UUIDs:

If you're counting all the UUIDs anyone makes, then valid<->attacker matches are a subset of all possible collisions and therefore less likely.

If your baseline is only the collisions between valid UUIDs, then whether an attacker is more or less likely to collide depends on whether they're generating UUIDs at least half as fast as the system they're attacking.

link

jsjohnst 1310 days ago

> That's near enough to true for anyone not operating at "web scale". FAANG/BAT engineers need to care.

I’d argue even then it’s really not much a concern. You’d need to generate 1 billion UUID v4’s per second for over 75 years to have a 50% chance of there being a single collision.

link

withinboredom 1310 days ago

You can generate sequential UUIDs, IIRC, that’s the best way to store them in a db and still have good partitioning/indexing. I don’t use UUIDs often, but I vaguely remember researching this problem space at some point.

link

Alupis 1310 days ago

I think most languages let you chose which version of UUID you want - with most defaulting to the random version (I think 4?) by default.

There are other versions that are sequential/time-based though, but using these could open the door to de-obfuscating whatever data you wanted to protect via UUID's in the first place (like how many sales orders you receive per hour, etc).

link

withinboredom 1310 days ago

I don’t think uuids are designed for obfuscation, though they certainly help with that as a side effect. I could be wrong though, I’ve never looked into it.

link

Alupis 1310 days ago

They (randomized type 4 UUID's) obfuscate as a side effect because they are much more difficult to guess due to their randomness. As the article points out though, they are not impossible to guess... but it will come down to your risk tolerance and what the UUID's are "protecting".

People like to reach for UUID's when obfuscation is needed because inventing your own duplicate-aware random string algorithm isn't what most folks want to spend their time thinking about. Plus, these days, many databases come with UUID-aware data types that make using UUID's fairly straight forward.

link

edgyquant 1310 days ago

UUIDs are a vast improvement over integers for preventing simple attacks like +/-ing the id and seeing what happens.

link

mrkeen 1310 days ago

But then you're back to collisions, and you may as well be using longs.

link

withinboredom 1310 days ago

I think v7 uses microseconds since epoch + random data. The odds of a collision should be practically 0, or more likely to find a sha256 collision.

link

morelisp 1310 days ago

> more likely to find a sha256 collision.

This is obviously, and egregiously, false.

link

withinboredom 1310 days ago

I don't know. You'd need quite a number of threads + machines generating uuids in the exact same microsecond to get an opportunity for a collision. It doesn't seem obviously false.

link

Waterluvian 1310 days ago

“Moving Away From Misusing UUIDs”

link

hn_user2 1310 days ago

My only wish is that UUIDs were sortable and still contained their timestamp. When bug hunting, sometimes things become a little more obvious when there is an exact start and end to ids with issues.

link

myvoiceismypass 1310 days ago

There are KSUIDs that aim to satisfy this

A go ref impl: https://github.com/segmentio/ksuid

link

scrollaway 1310 days ago

Also, UUIDv6, v7 and v8. Still a draft.

https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...

link

dexwiz 1310 days ago

Depends on the version used. Some of them do encode time. But since people don’t like to leak information they use the random version (4).

link

andreareina 1310 days ago

They're little endian so not sortable

link

sgtnoodle 1310 days ago

What does that have to do with anything?

link

andreareina 1310 days ago

>>>> My only wish is that UUIDs were sortable and still contained their timestamp. When bug hunting, sometimes things become a little more obvious when there is an exact start and end to ids with issues.

>>> Depends on the version used. Some of them do encode time.

Encoding time isn't enough, it has to be big endian (unless you write a special sorting function for uuids). Timestamped uuids store the timestamp as [timestamp_low, timestamp_mid, version(!), timestamp_high][1] which doesn't sort right.

[1] https://en.m.wikipedia.org/wiki/Universally_unique_identifie...

link

pmontra 1310 days ago

According to that Wikipedia page the binary representation of UUID 1 is big endian. It's the date-time and MAC address version.

link

mirzap 1310 days ago

You can use ULID and store it as UUID since they are the same size. You can check this article for the details:

https://blog.daveallie.com/ulid-primary-keys

link

ilikehurdles 1310 days ago

UUIDv7 is sortable by time but I’m not sure if it’s possible to derive the time stamp from the UUID somehow.

link

jayknight 1310 days ago

The first 48 bits of uuidv7 is the number of microseconds since the epoch.

link

kube-system 1310 days ago

I’ve always liked the pattern of putting timestamps on any objects in my DBs.

link

vbezhenar 1310 days ago

I implemented it myself. Was a little bit tricky, but not rocket science.

link

Too 1310 days ago

Mongodbs ObjectId has this property.

link

human 1310 days ago

Something I don't understand: how are UUIDs not safe given that they are probably better than 99.9999% of passwords generated by users?

link

monocasa 1310 days ago

Does your UUID library use a cryptographic safe RNG?

link

lolinder 1310 days ago

Java's does, and that's the implementation the article discusses.

link

lll-o-lll 1310 days ago

But this is the point though, UUID is the wrong tool for the job. You want a cryptographically random blob of entropy and you reach for a UUID because it happens to contain some of that in a specific implementation.

UUIDs are for uniqueness and involve implicit trust. Cryptographic libraries are what you need to generate entropy blobs without weakening security/confusing the next developer etc.

link

rr808 1310 days ago

UUIDs are nearly half the mac address of the server + a timestamp. They are in no way random.

link

throwanem 1310 days ago

That's UUID v1. The random one that everyone uses is v4.

link

kube-system 1310 days ago

I have seem some common libraries that default to v1, so I can see why there’s some confusion in here.

link

nordsieck 1310 days ago

> Something I don't understand: how are UUIDs not safe given that they are probably better than 99.9999% of passwords generated by users?

UUIDs are 128 bits. Which is beat by a 5 character a-z random string.

It's certainly possible that they're better than the median password - especially if there isn't a check against a common password list. But it's pretty easy for user chosen passwords to be much, much better.

I strongly doubt that your 6 9s estimate is accurate.

link

lolinder 1310 days ago

> UUIDs are 128 bits. Which is beat by a 5 character a-z random string.

A sibling gives the actual math that shows how wrong this is, but this doesn't even pass the most rudimentary sniff test. The most common encoding for a lowercase string would be in 8 bits per character, so a 5 character string can get you at most to 40 bits.

And that's assuming you allowed every one of the 256 possible characters. You're restricting it down to 26 characters.

EDIT: I was curious, so I checked. Even if you allowed every current Unicode character, 5 characters only gets you to ~86 bits of entropy:

log2(149186^5) ~= 85.9

As for the original 6 nines claim, I also calculated the entropy for a 14 character random password that allows all 62 letters+numbers plus 8 special characters:

log2(70^14) ~= 85.8

It's not until 20 characters that it matches a UUID v4. So, yeah, I'm okay with OP's 6 nines.

link

pmontra 1310 days ago

128 bits are 16 bytes, which is at best a binary string of 16 characters. Remove some bits for the not random parts of the UUID and still you don't get down to 5 characters. Furthermore "a 5 character a-z random string" is less than 5 bits per character. Make them less than 6 by adding A-Z and the ten digits.

About storage, at least PostgreSQL has been using 16 bits of storage since at least version 8 many years ago.

https://www.postgresql.org/docs/current/datatype-uuid.html

https://www.jacoelho.com/blog/2021/06/postgresql-uuid-vs-tex...

link

prutschman 1310 days ago

A 5 character a-z random string has log2(26^5) =~ 23.5 bits of entropy, way less than 128.

link

andreareina 1310 days ago

The best case for a 5 ascii character password is 7 * 5 = 35 bits.

link

dheera 1310 days ago

Also UUID v3 and v5 produce IDs from identifiers such as URLs which can be quite useful if you want two different systems to generate the same exact UUID given knowledge of the same URL.

For example, in a REST system that needs UUIDs I'd use the REST URL of the object as the UUID.

link

echelon 1310 days ago

The best format:

{opaqueTokenTypePrefix}_{crockfordEncodedEntropy}

Also: pass token through a bad words and "credit card lookalike" filter.

Optionally encode author cluster/region details in the low order bytes to resolve before eventual consistency in active-active systems.

link

Ptchd 1310 days ago

> If you're using them for unguessable random strings then yeah, they're not ideal.

Why? I like to use them for private/secret URLs ...

link