|
|
|
|
|
by adileo
1328 days ago
|
|
Author here, you're correct, the title is misleading and assumes a match between surrogate and primary keys in a simplistic way, especially for when you can't make use of a natural key. (most of the cases in the domain I work in) If you can't identify a natural key that indicates failure to normalize or even the wrong model of the data. I'm not sure about this point. As a matter of fact a lot of entities in the real world do not have a natural key, or if they have it, it could be subject to changes. How do you identify a user or an anonymous post on a forum? Maybe for an user you could just use the email. Then, what if the user want to change their email?
Or what if one day you want to allow users to sign-up with just their social login? Surrogate keys are always a good level of indirection to keep a consistent identity, and especially useful as foreign key and hence also as a primary key. Of course with some exceptions (eg. intermediate tables, where the PK is spanning across multiple columns) But I'm curious to know also other scenarios since my view is skewed by the business domain I work in. |
|
Primary keys must satisfy a couple of constraints: uniqueness and not NULL. A primary key can change, and all modern RDBMSs can cascade primary key changes to foreign keys. But using a natural key like an email address does open up the possibility of duplicates, whereas surrogate keys never have that problem. With a schema that uses (say) email as a primary key you have decide between preventing adding a new user because of a duplicate key, or allowing the duplicate email address in the database and dealing with it some other way.
I try to use natural keys but as you point out that can get impractical and seem quixotic when surrogate keys solve real-world problems. I don't default to surrogate keys (and I don't use ORMs) but I often fall back on surrogate keys. And I've had to refactor databases away from natural keys to surrogate keys because the original key caused more problems than it solved.
As for making primary keys publicly visible, that comes up in web applications that include IDs in URLs. That information -- that I have user ID 2345 -- lets hackers try scanning the space of IDs to see what they get. UUIDs or some other non-sequential value prevents that to some degree, at the cost of dealing with long keys creating large indexes and more cache thrashing in the RDBMS. Which works best depends on the application and what other measures you can take to prevent hackers digging through your database with HTTP requests.