| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blopker 19 days ago

UUIDs are way over used. There is almost always a better key to use, usually a bigint for databases. If you're making some kind of leaderless distributed data store, then maybe, but even then there are other ID sharding strategies I'd go for first depending on the constraints.

For a single database, bigints are smaller and faster, with less footguns.

UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.

9 comments

Fabricio20 18 days ago

> bigints are smaller and faster, with less footguns

But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!

Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.

silvestrov 18 days ago

Good trick is to prefix all such keys with magic, i.e. a couple of letters that identify type type of key.

Then it will always be a string and you will be free to change the format/type of the key in the future to UUID or whatever you like.

zmj 18 days ago

Rule of thumb: if you’re not doing math with a value, it’s not a number.

throw-the-towel 17 days ago

Speaking of which, one of my favourite UX brainfarts is treating text fields where you enter a sum as numbers.

Why, you ask? Let's you have a number like 10,000 and you want to replace it with 20,000. You delete the leading 1, and boom! The number is now zero, and three of the digits are gone, and you'll have to retype them like you got no other things to do with your life.

myko 18 days ago

I've been preaching this for years but nobody believes me until it bites them in the ass

klysm 18 days ago

Indexes?

Terr_ 17 days ago

> Javascript WILL interpret your bigints as Number()

A similar horror story from PHP, which I discovered by diagnosing a test failure. (Or maybe it was in production? Long ago, can't remember.)

I think the code in question was for some kind of web auth, comparing random 32-character hexadecimal strings. PHP has a "feature" where its == operator falls back to trying certain strings as numbers... and that includes a version with scientific notation. (12000 == "12000" == "12e3")

Such a collision through bad comparison may seem unlikely, but there are two islands of higher odds: 0*10^X is zero for any X, and X*10^0 is one for any X. Finally, leading zeros can be included. ("0e1234" == "00000e1" and "1234e0" == "9e0000")

The fix was simply going to stricter ===, but it definitely reinforced my dislike of "loose" languages.

paulddraper 18 days ago

!!

Node.js drivers will correctly read int64 as string or bigint, not number.

E.g. pg for PostgreSQL

Maybe there’s a buggy driver but I don’t know it.

Fabricio20 18 days ago

Browser!! The browser reads it as Number. If your rest api returns {"id": 1324535222364012585} for example, javascript will try and parse that as number from the response!!!

You can of course, change the api such that it does {"id": "1324535222364012585"} instead and voila, it will no longer try parsing it as number. Or the many other workarounds people have recommended above (like appending a prefix, or using a different encoding), but why is it trying to parse a number thats too big and instead of throwing it just rounds down without telling you????!

paulddraper 18 days ago

Huh? The subject was database drivers.

You seem to be talking about JSON. (Which technically has no limit on number size or precision, but in practice is float64.)

Piezoid 18 days ago

Using a Feistel cipher and base 32 encoding at the boundaries of the system can help catching vibe coded edge code that attempt to decode identifiers in javascript. It also somewhat obfuscate the cardinalities and fill rate of the tables.

sheept 18 days ago

This can be avoided by supplying a reviver:

    const json = '{ "a": 9007199254740993 }'
    JSON.parse(json, (_key, value, context) => /^\d+$/.test(context.source) ? BigInt(context.source) : value)

tyre 18 days ago

Which can be avoided by using UUIDs

jraph 18 days ago

Or by putting that id between quotes so it's a string.

spiffytech 18 days ago

Fortunately we're seeing more JS DB libraries offering to read large numbers as the BigInt type.

shakna 18 days ago

But frustratingly, a JS BigInt is nothing like a BigInt in any other language.

In JS - BigInt is 64bit integer.

In anything else - BigInt is a arbitrarily large integer.

anematode 18 days ago

Hm? JavaScript BigInts are arbitrary precision, and you need to use methods like BigInt.asIntN(64, a) to convert them to 64 bits

mort96 18 days ago

I hate this so much because you can’t nicely serialise a BigInt as JSON. Using a string is nicer but it only makes sense where int64 is used as an ID, not where it’s used as a number; and you don’t wanna have to configure this per field per query.

sheept 18 days ago

You can serialize a BigInt by specifying a replacer:

    const obj = { a: 9007199254740993n }
    JSON.stringify(obj, (_key, value) => typeof value === 'bigint' ? JSON.rawJSON(value.toString()) : value)

marcosdumay 18 days ago

IMO, I'm tending toward thinking that having types on your readable serialization format is a mistake, and that they should be always input to the (de)serializer instead.

nh2 18 days ago

JSON has arbitrary length numbers in the spec only.

Etheryte 18 days ago

This is simply not true? Or maybe I misunderstand what you mean?

JamesSwift 18 days ago

UUIDs also have a nice benefit of it being impossible to query the wrong table with one if you mixup what an FK goes to

chrismorgan 18 days ago

You can achieve this with numeric sequences too, by having a consistent step and unique offset in all your sequences. For example, if you will never exceed 16 types, reserve four bits as the type discriminant. (You don’t have to use powers of two, but it may be convenient.)

All sequences use step 16.

Type A has discriminant/offset 0, yielding IDs {0, 16, 32, 48, 64, …}.

Type B has discriminant/offset 1, mapping to IDs {1, 17, 33, 49, 65, …}.

All the way up to Type P with discriminant/offset 15 and IDs {15, 31, 47, 63, 79, …}.

This is also trivially invertible so that you can determine the type from the ID.

A more common approach is to make IDs opaque strings and put a type prefix—A0, B12, P34, that kind of thing. But this way you can keep it as a number, if you wish.

throwawayo2oe 18 days ago

Alternatively just use a shared sequence for all tables.

sgarland 17 days ago

Or just write tests, instead of relying on statistical improbability to prevent disaster.

pyuser583 18 days ago

Yeah this is nice - also helps with grepping dump files.

mamcx 18 days ago

How is this done?

nickpeterson 18 days ago

They just mean you catch incorrect joins more easily because there is usually no overlap in keys between unrelated tables. Using int, you’re usually going to have some shared values between two unrelated tables.

sudoshred 18 days ago

Statistically impossible to inadvertently generate a collision using UUID keys. UUID is designed to be unique when generated across any computer system. Practically speaking if you have an exactly matching pair of UUIDs from disparate system you have found the exact record match. The name gives a hint "Universally unique identifier". -Not a cryptographer.

1659447091 16 days ago

You might find this thread interesting. UUIDv4 should probably be avoided

https://news.ycombinator.com/item?id=48060054

usrnm 18 days ago

It definitely is possible, just very improbable

echoangle 18 days ago

That’s probably what’s meant by statistically impossible.

pavo-etc 17 days ago

"very" is underselling it

ErroneousBosh 18 days ago

It definitely is possible, just very much a "woah, shit, guys come and look at this!" moment.

beagle3 17 days ago

More like a moment that the guys can’t come because each one was independently struck by a lightning.

masklinn 18 days ago

The U means if you join the wrong table your join will always come up empty.

It does not actually make it impossible to query the wrong table it just tells you quickly when you’ve done so.

tannenfreund87 12 days ago

On the contrary, the need for UUIDs is growing. Once you have multiple users collecting and editing data for a central database, you'll run into primary key conflicts. Of course not, if your users are constantly online and never lose the connection to the database. But modern usecases have a remote database over the internet and distributed users, often with slow or spotty connection.

I've developed a field survey app for foresters. They use it on toughbooks, tablets and phones. They are collecting spatial data, so the geometry column in the tables gets quite big. The app on the device uses a SQLite (Spatialite) database, the central database is Postgres (PostGIS). They will often edit the same area, so without UUIDs, there will be duplication of primary keys, thus making the database inconsistent. Then I will be flooded in support tickets and it will cause more slowdown than just using UUID4. And the performance drop of UUID7 is negligible compared to bigint for primary key.

willtemperley 18 days ago

UUIDs make client code so much simpler. Just create a UUID, use it client side to create your object graph and commit or not as appropriate. No need to retrieve an incremented integer.

sgarland 17 days ago

Every DB, even MySQL can return the autoincrementing integer for you as part of the insert. Postgres, SQLite, and MariaDB (likely others, I’m just not familiar) can even return the rest of the data, should you need that.

IME, most of the arguments for why UUIDs make things better are due to developer ignorance of RDBMS features (or B+tree performance).

willtemperley 10 days ago

I’m aware of insert returning. That’s still more work than “mint a UUID”. Once the incremented id is returned it then has to be set on the model, in some cases like GRDB in Swift it requires the id to be optional which is just annoying.

bob1029 18 days ago

I am finding UUIDs help a lot if your primary schema consumer is an LLM.

Inappropriate aliasing of integer keys allows for silent errors in queries because it will actually return some result a lot of the time. A UUID is immune to this problem. The model recognizes its mistake a lot more reliably when previously non-empty tables start showing up empty after attempting a join.

andersmurphy 18 days ago

Yes this matters even more if you are doing a lot of joins. Naive string UUIDs are 32 bytes (though I use binary uuid in the post which is 16) compared to 8 bytes for a 64-bit int. This matters even more with sqlite as it uses varint encoding. The upshot of all this is your indexes take up a lot less space in memory.

Fire-Dragon-DoL 18 days ago

Providing an ID from the client is a big advantage that's missing though. Especially if you want a UI with optimistic rendering that's dealing with something async

PUSH_AX 18 days ago

What are uuid foot guns?

JamesSwift 16 days ago

They are generally distributed in such a way that values created at nearly the same time dont cluster together which is less efficient for DBs

crubier 18 days ago

No one ever got fired for using UUIDs