Hacker News new | ask | show | jobs
by ryanbrunner 3666 days ago
The problem is there's very few examples of real-world data that actually matches a primary key - i.e. is guaranteed to be unique and never changes. I've been burned by this so many times. In reality, everything needs to at least have some capability to change.
1 comments

Unique and never changes can be two things concatenated, not one thing.

If that concatenation is too long you can ram anything thru a hash to get a constant length smoothly distributed key. Its also fun to use "weak" hashes for this because it trolls wanna be security types who don't understand the application.

Given the above, some important things concatenated and hashed works. Can always add an application ID or importer ID or process ID or a timestamp of high enough resolution.

Obviously needs are different if you're trying to create a bank user database vs deduplicating sampled engineering data.

So something that's unique is NOW() (assuming low enough sample rate LOL) and something that never changes is a laser serial number, concatenate those and ram thru a hash to make it small and fit.

Concatenating helps you with uniqueness, but not with whether things change (if anything that's part of a composite key changes, it can no longer reliably be part of a key). And it's the changing part that's not realistic with natural keys.

Once you get into application ids / timestamps / whatever you're no longer really using a natural key in my mind, you're just making your own algorithm for a surrogate key.

> So something that's unique is NOW() (assuming low enough sample rate LOL) and something that never changes is a laser serial number, concatenate those and ram thru a hash to make it small and fit.

That sounds very similar to UUID version 1 except that you replaced MAC address with some other identifier and added a hashing step.

> Unique and never changes can be two things concatenated,

No, for a good primary key, the "unique" parts can be due to concatenation, but the whole key (and thus all the elements) needs to be unchanging. [0]

[0] Modern databases can actually deal with changeable PKs, though with a potentially serious performance hit, but in many use cases where data travels outside the database for some kind of interaction where the results need to get reentered into the database, this is still a problem, since you can't (in any general way) cascade updates to things (which may not even be online systems, e.g., paper records feeding human processes) outside of the DB.

This is a lot of thinking and work to replace an auto-incremented integer, which is simple and works fine in most cases, though.
> So something that's unique is NOW() (assuming low enough sample rate LOL)

We can all invent algorithms for nice unique values when we include the caveat "apart from the edge cases". It's those edge cases that bugger everything up.

"Hash of the data defining the entity identified" is how version 3 and 5 UUIDs work (plus a namespace). If you really want a weak hash use version 3 - it uses MD5.