Hacker News new | ask | show | jobs
by sk5t 1849 days ago
A vote here against integer/serial PKs, not only because they leak information, but also because they can result in incorrect joins.

IME it's much more often I've quickly made a table with a serial PK and later wished it were uuid; just about never made a uuid and later wished for the compactness or natural clustering of bigint. Maybe for a table of millions and millions of time-ordered events.

2 comments

Note I said "internal use". But how can primary keys result in incorrect joins?

Unless you're changing a foreign key, joins will always be correct.

Unless I'm doing something wrong in the last 30 years of using SQL.

If you use serial integer ids and accidentally join on the wrong tables/columns you will get rows back even if the join doesn't make sense, because all serian integer ids have values in common. If you're using UUIDs you will "never" get rows back when joining on the wrong ids and spot your mistake.
I think if you're joining against wrong tables or columns, then you have bigger problems than if the values are BigInt or UUID's.
"Don't ever be wrong" is one way to do it; on the other hand, sometimes we make mistakes anyway, especially in ad hoc queries, and not getting any results back at all helps to spot the problem quickly (more quickly than spending minutes befuddled by the nonempty resultset).
One thing that no one mentions so far is the extra space these ids consume.

Is the value of making that mistake strong enough that you use so much more data to store a UUID value??

Well, I in fact did mention the "compactness" of integers at the start of this thread :)
Value safety instead of type safety. It's reasonably likely to work; even if you can't trust the UUIDs to be generated by a trustworthy source, when you test the query it'll be wrong even if you accidentally join against a table that has a sufficiently similar schema.

I'm more familiar with MySQL where I don't have anything other than basic, automatically converted types.

But in Postgres, where there's more types, can I create a type that is representationally equivalent to a builtin type but not automatically convertable? So that if I say "select * from donkey left join plant on plant.id = donkey.animal" it says "error: cannot compare PlantId with AnimalId, use an explicit cast if you really want to do this". If this were possible and ergonomic, it would be better than a relatively cryptic null in the case that plants and animals have distinct (UUID) PKs.

> […] but also because they can result in incorrect joins.

Side question: can I get Postgres to throw an error if I try to join on two IDs where neither of the IDs have a foreign key reference to the other?

I wouldn't think so, and this sounds like a mighty footgun!