|
|
|
|
|
by gregjor
1328 days ago
|
|
Primary key seems like a synonym for surrogate key for a few reasons. I think ORMs almost always assume surrogate key-based schemas, for example. It seems that database design has defaulted to surrogate keys over time. My complaints about the article came from my interpretation of primary key equals a surrogate key, I apologize for reading more into it than intended. Primary keys must satisfy a couple of constraints: uniqueness and not NULL. A primary key can change, and all modern RDBMSs can cascade primary key changes to foreign keys. But using a natural key like an email address does open up the possibility of duplicates, whereas surrogate keys never have that problem. With a schema that uses (say) email as a primary key you have decide between preventing adding a new user because of a duplicate key, or allowing the duplicate email address in the database and dealing with it some other way. I try to use natural keys but as you point out that can get impractical and seem quixotic when surrogate keys solve real-world problems. I don't default to surrogate keys (and I don't use ORMs) but I often fall back on surrogate keys. And I've had to refactor databases away from natural keys to surrogate keys because the original key caused more problems than it solved. As for making primary keys publicly visible, that comes up in web applications that include IDs in URLs. That information -- that I have user ID 2345 -- lets hackers try scanning the space of IDs to see what they get. UUIDs or some other non-sequential value prevents that to some degree, at the cost of dealing with long keys creating large indexes and more cache thrashing in the RDBMS. Which works best depends on the application and what other measures you can take to prevent hackers digging through your database with HTTP requests. |
|
* at the cost of dealing with long keys creating large indexes and more cache thrashing in the RDBMS. *
This is actually one of the point I'm currently dealing with. Especially finding the right trade-off to manage records in the order of hundred millions/billions, the right decision could save GB of space both on disk and RAM and seconds in computation during each JOIN operation for example.
That's why I drafted out a comparison table, sometimes I see people start using UUIDs right away just because they are offered as out of the box in some ORMs or just because it souds "cool", without knowing that there are better or simplier alternatives out there.