| > The UUID would be an example of an external key (for e.g. preventing crawling keys being easy). This article mentions a few reasons why you may later decide there are better external keys. So we are talking about "external" keys (ie. visible outside the database). We are back to square one: externally visible surrogate keys are problematic because they are detached from real world information they are supposed to identify and hence don't really identify anything (see my example about GDPR). It does not matter if they are random or not. > How are you going to trace all those records if the requester has changed their name, phone number and email since they signed up if you don't have a surrogate key? And how does surrogate key help? I don't know the surrogate key that identifies my records in your database.
Even if you use them internally it is an implementation detail. If you keep information about the time information was captured, you can at least ask me "what was your phone number last time we've interacted and when was it?" > I think that spirals into way more complexity than you're thinking. This complexity is there whether you want it or not and you're not going to eliminate it with surrogate keys. It has to be explicitly taken care of. DBMSes provide means to tackle this essential complexity: bi-temporal extensions, views, materialized views etc. Event sourcing is a somewhat convoluted way to attack this problem as well. > Those queries are incredibly simple with surrogate keys: "SELECT * FROM phone_number_changes WHERE user_id = blah". Sure, but those queries are useless if you just don't know user_id. |
All IDs are detached from the real world. That’s the core premise of an ID. It’s a bit of information that is unique to someone or something, but it is not that person or thing.
Your phone number is a random number that the phone company points to your phone. Your house has a street name and number that someone decided to assign to it. Your email is an arbitrary label that is used to route mail to some server. Your social security number is some arbitrary id the government assigned you. Even your name is an arbitrary label that your parents assigned to you.
Fundamentally your notion that there is some “real world” identifier is not true. No identifiers are real. They are all abstractions and the question is not whether the “real” identifier is better than a “fake” one, but whether an existing identifier is better than one you create for your system.
I would argue that in most cases, creating your own ID is going to save you headaches in the long term. If you bake SSN or Email or Phone Number throughout your system, you will make it a pain for yourself when inevitably someone needs to change their ID and you have cascading updates needed throughout your entire system.