Hacker News new | ask | show | jobs
by eric4smith 1849 days ago
Simple rules:

Use integer primary keys internally for identifiers and relationships.

Use English/Other Language permalinks for URL's

Use UUID's in places like API's one-time action links and "private" links that you only want to share with other people.

Worked fine for me for many, many years.

1 comments

A vote here against integer/serial PKs, not only because they leak information, but also because they can result in incorrect joins.

IME it's much more often I've quickly made a table with a serial PK and later wished it were uuid; just about never made a uuid and later wished for the compactness or natural clustering of bigint. Maybe for a table of millions and millions of time-ordered events.

Note I said "internal use". But how can primary keys result in incorrect joins?

Unless you're changing a foreign key, joins will always be correct.

Unless I'm doing something wrong in the last 30 years of using SQL.

If you use serial integer ids and accidentally join on the wrong tables/columns you will get rows back even if the join doesn't make sense, because all serian integer ids have values in common. If you're using UUIDs you will "never" get rows back when joining on the wrong ids and spot your mistake.
I think if you're joining against wrong tables or columns, then you have bigger problems than if the values are BigInt or UUID's.
"Don't ever be wrong" is one way to do it; on the other hand, sometimes we make mistakes anyway, especially in ad hoc queries, and not getting any results back at all helps to spot the problem quickly (more quickly than spending minutes befuddled by the nonempty resultset).
One thing that no one mentions so far is the extra space these ids consume.

Is the value of making that mistake strong enough that you use so much more data to store a UUID value??

Value safety instead of type safety. It's reasonably likely to work; even if you can't trust the UUIDs to be generated by a trustworthy source, when you test the query it'll be wrong even if you accidentally join against a table that has a sufficiently similar schema.

I'm more familiar with MySQL where I don't have anything other than basic, automatically converted types.

But in Postgres, where there's more types, can I create a type that is representationally equivalent to a builtin type but not automatically convertable? So that if I say "select * from donkey left join plant on plant.id = donkey.animal" it says "error: cannot compare PlantId with AnimalId, use an explicit cast if you really want to do this". If this were possible and ergonomic, it would be better than a relatively cryptic null in the case that plants and animals have distinct (UUID) PKs.

> […] but also because they can result in incorrect joins.

Side question: can I get Postgres to throw an error if I try to join on two IDs where neither of the IDs have a foreign key reference to the other?

I wouldn't think so, and this sounds like a mighty footgun!