Hacker News new | ask | show | jobs
by 3pt14159 1840 days ago
> Now, sometimes a table has a natural primary key, for example the social security number of a country’s citizens.

You know, you think that, but it's never that simple. The field was added incorrectly and nobody noticed until the value is in countless tables that you now need to simultaneously update or the value is something that's supposed to be semi-secret, so now a low level support staff can't reference the row when dealing with a request. Or the table's requirements change and now you need to track two different kinds of data or data that is missing the field.

Me, I always just have the table make its own ID. It is just simpler, even when you think it is overkill.

7 comments

In Spain each person has a unique ID number assigned at birth. The numbers for newborns are geographically pre-distributed to guarantee uniqueness despite delay in paperwork. It is universally accepted that this ID "number" (it actually has one letter too) is all you need to identify yourself, ever.

Except that I knew a coworker who had a duplicate ID. An extremely rare event, they messed up the pre-assignment and there is another dude somewhere with his same ID. So from time to time, some system would tell him that his ID was already registered. A lot of banks and stuff like private healthcare systems like to use the DNI as usernames.

He tried to get his ID changed, but that was such a foreign concept to any of the involved institutions, that he had to give up because there simply is no such procedure. I guess he could have taken it to court, but the guy decided to just live with it (the justice system is quite slow here).

The fact that it's fixed / can never be changed is a massive problem with social security numbers. That, and the fact it's often used as authentication instead of identification. They're moving away from that slowly, but it's taking a lot of time and effort.
The problem with ssn is specific to the US. Other countries have sane ways of authenticating citizens and the personal id number is just used as a global foreign kes for all government or public or bank database where you need to uniquely identify a citizen.
It does seem that a "natural key" is frequently just a really foreign key in a database you and your org don't manage.
That's a good observation. The only meaningful distinction between a natural key and a surrogate key is whether the number ever escapes the original system.

For instance, a driver's license number is printed on the card itself, so a human sees it. Therefore, it's a natural key, just like a name.

When you decide that whatever natural keys already exist aren't good enough for your organization, and you make a new key, it's not good to think of that as a surrogate key. The number will make it out somehow (as a "record locator" in a customer support call or something), and eventually become a natural key.

It's best to just plan for any new key to be a natural key, which means using best practices for natural keys. That means it should be something reasonable to print, read, say, and hear; and it should also follow a pattern so it can be distinguished from other special numbers.

Auto-increment is a shortcut, but usually not great in the long term unless it's something that will be well-contained inside the database as an implementation detail (e.g. a join key designed to refer to rarely-accessed fields of a wide table).

> > Now, sometimes a table has a natural primary key, for example the social security number of a country’s citizens.

> You know, you think that, but it's never that simple.

It’s that simple if you’re the Social Security Administration and its a table of Social Security Accounts, not people.

Other than that, using SSNs as a primary key is just plain wrong.

> It’s that simple if you’re the Social Security Administration and its a table of Social Security Accounts, not people.

Nah, they keep track of duplicate usage: https://www.nbcnews.com/technolog/odds-someone-else-has-your...

> The IRS often knows when this happens, when the imposter pays taxes. The Social Security Administration knows, too, for the same reason. And the nation's credit bureaus usually know, because the imposter often ends up applying for some form of credit. Plenty of financial institutions also have access to this information.

Maybe using them as passwords is what's wrong.
How do you cope if you want to record people who don't have an SSN or equivalent? (e.g. I have none, because the country of my citizenship doesn't issue anything comparable)

Or if you're expanding to countries whose SSNs clash with each other?

SSN as PK is wrong regardless of whether you're doing SSN as password.

Right? Credit card numbers as a primary key is far more efficient.
I finally got our company to standardize on someone's employee number as a primary key for everything employee related. It's a simple monotonically increasing integer value -- the best possible primary key.

We moved to a new HR system and they have a set of "reserved" employee numbers that cannot be used and we have employee numbers in that range. Arg!

We had a classic situation at a software house I worked at in the '80s - employee numbers were 1-999 and then jumped to 5,000 - because, you guessed it, this "unique" field was used with magic numbers 1,000 - 5,000 being reserved for project ids in various key accounting systems!

And we were supposed to teach our customers good design principles...

In an event-sourced setup, I found that for projections, this is less of a problem. And opens some possibilities, like more semantic schema's and easier, or simpler API's.

A projection in ES, is more a cache, not your primary store. The primary store is the eventlog. The latter should, obviuosly, never use natural ID's.

I've been bitten by using natural keys on several occasions, but I can' think of a time surrogate keys failed me beyond the tediousness of implementation.
In the Netherlands SSNs are not unique, they handed out some duplicate ones back in the day. So not great as a primary key. Besides, I think using them as primary keys is illegal anyway.
In Denmark they are specifically not allowed to be used as a primary key. I mean many do, but technically your suppose to have a separate internal ID and then you use the SSN to look up that ID.
Huh? How does sign-in work for people with duplicate ID’s?
SSN's predate sign-in and the internet. Anyway there is a digital government sigle-sign-on solution called DigID, but you need to create an account etc for it using an e-mail address as the sign-in.
Imagine a man with a stick or gun coming to your office to shackle you in chains for how you arranged your database schema.