Hacker News new | ask | show | jobs
by sargram01 2407 days ago
I have a hyphen in my last name that caused the California DMV to make one of my last names a middle name, and the Social Security Administration can’t verify my name on their website also due to it.

I feel like it’s time software needs to level up, ok 30 years ago sure mistakes were made, but now if you live on planet earth you have to know how names work after how many thousands of years our current systems have been in place.

2 comments

"how names work"

part of the issue is people writing software making decisions about 'how names work', and there being multiple interpretations of it. I've wanted - many times - to build in just 'name' in systems, vs "first name", "last name", "middle name", "suffix", etc. Because... inevitably, clients have to support someone that doesn't fit that mold. The end user probably has dealt with it dozens of times already, but it's still bad for them, and usually unnecessary. MOST of the time, we only ever take "first" and "last" and concat them on the screen anyway, then keep them separated for someone to sort via excel...

It's actually interesting that frameworks provided by platforms such as .NET, Java, etc., don't include an abstraction for the representation of names.

Such abstractions exist for dates, times, calendars, currencies, calculations with money, and so forth, but not names.

On the one hand I can understand it, because names are so complicated, and how would you sit down and come up with something good enough to represent all of them?

On the other hand they're prevalent in such a high percentage of line of business and consumer facing apps that it's almost ridiculous that every single developer on the face of the earth at one time or another has to come up with their own half-baked implementation.

It's especially ridiculous when you consider that so many of these home-rolled implementations, if not all of them, are rife with terrible flaws that constantly cause frustration and inconvenience to a small but significant number of users.

This is a solved problem from a modeling standpoint. The HL7 Reference Information Model allows any entity (such as a person) to have multiple names. Each name can be tagged with a type (legal, maiden, alias, etc) and validity date range. A name can contain multiple parts in any order, optionally tagged as prefix / suffix / family / given. Names can also be explicitly marked as null if unknown or not assigned. There are open source RIM implementations in several languages.
Interesting; I didn't think I'd see an HL7 reference in this thread. I work somewhat with FHIR, which also has a HumanName[1] data type, and I think it handles most of the cases in this thread.

For those not familiar: FHIR is a standard that covers health and patient data. IMO, it's a pretty good model. (HL7 is the organization, and there are few other standards under it.)

I'm less familiar with RIM; could you link to it's definition of a name? (The best I could find suggested that it was nothing more than an unconstrained piece of text.)

[1]: https://www.hl7.org/fhir/datatypes.html#humanname

Unfortunately due to the way the HL7 web site is structured there's no way to give a direct link. Go to the Normative Edition, Foundation, Data Types, Basic Types.

http://www.hl7.org/implement/standards/product_brief.cfm?pro...

The FHIR data model is a little simpler to allow for easier implementation. In the vast majority of real world healthcare use cases it works well. But from a modeling standpoint if you need to cover odd edge cases it sometimes helps to look back at the old RIM.

What's the purpose of structured modelling of names? Why does it matter which parts are family vs given vs whatever else? Lots of the W3C International Examples they give use a `<text>` field. Why not just use that?

https://www.hl7.org/fhir/datatypes-examples.html#2.24.1.13.1

It matters for collation order, if you want to show a list of patients sorted by family name.
A dedicated, well-maintained name abstraction is certainly something that needs to happen. More than interesting, it's a bit bizarre that this hasn't been done yet (AFAIK.)

In terms of developer-facing complexity, this could be a laughably simple thing to use- just a type that supports equality, perhaps ordering, and conversion to string. Only the constructor would need to be complex. :)

I guess the reason this hasn't been done is simply that the implementation would never be "correct"- there is no formal specification of human names out there, and there would always be cases where some poor individual with an unusual name falls afoul of the system. Strictly fewer cases than we have today, where everyone rolls their own name system, but still some; it's not a solvable problem the way timezone conversion is.

But, on the other hand, that's the way the world is- messy. Developers are going to have to learn better best practices for communally doing our best in cases where there is no perfect answer, because there's only going to be more of such cases as tech continues to eat the world.

I actually attempted to do this.

I used ML techniques to help smooth over some of the difficult parts (there are many difficult parts). The hardest cases are ambiguous names, for instance delineating Hispanic vs. Puerto Rican naming conventions (they're different). The fundamental approach involved pushing all ambiguity up to the end user, so they always have the option to correct the system.

https://www.alphanym.com/demo/?jm2

I’m pretty sure it’s solvable, the main problem is that we break up names into first and last to identify the parts and we do bad data quality checks. Let’s say we did just have one field and a service that was trained on each counties’s variations that could return the parts of the name you wanted. So some database and detection system to understand the pattern. It’s definitely possible since we humans do read and understand names just fine in our own locales.
I'm not convinced it is solvable. I don't think the general case is reliably solved even by humans.

E.g. is Carlos the same person as Karl?

Well, that depends. Was one of them localized, or are these the actual given names. Just this weekend I was on an offsite and saw a Spanish book about Karl Marx, or Carlos Marx as they had written it in the title, in the library of the house we stayed in.

Clearly in this instance the names are the same, but that requires knowing that Carlos Marx maps to Karl Marx and that Karl Marx is a famous name; otherwise you can't assume the name was translated.

There were many other names on books in that library. I don't know which one of them - if any - also maps to someone known under another name, because that requires me to know which person they are about.

Is Curt, Curth, Kurt the same person? My uncle had all three on different documents, and delighted in telling people about it.

What about countries like the UK, where there is no legal requirement to notify anyone of a change of name, and where a the legal way of formally changing your name - a "deed poll" is just a document structured in a certain way where you assert that you are known under a certain name? My ex is known under at least three different name combinations, all of which are present on different sets of legal documents.

Some subsets of the issue is solvable, but for example there is no way of taking a full name and returning the "name this person prefers to be known by" because the name does not contain that information. You can make a pretty good guess.

But you'll fail dramatically for people from different countries. And don't think for a second you can guess correctly based on where a name is from - many names are used in different countries, and often as different elements (e.g. firstname one country, lastname another; feminine name one place, masculine another), and many people have names that combine different nationalities (e.g. my son has a name that combined an English firstname, a Nigerian middle name and a Norwegian last name).

The only reliable solution is to not assume any one single string can be used as a generic name - you need to ask what to use within a given context and within whatever constraints you have.

It would help if he quit using the First|Middle|Last| terminology an used Surname(if-any)|Personal-name #N|Personal-name #N+1(ifany)...
It would help if we quit using that and had one field for the full name and then another one for "how do you want to be addressed".
We already have an abstraction for names: a text field. Trying to be more clever than that will break for someones name.
Which is great until the client asks for a friendly name (given name) identifier. GitHub uses name, we use first/last name. So we just shove the GitHub name in the first name spot and ask people to organize what looks right.

Our partners suggested string.split(' '), which produced interesting results against the sample list of github users.

Use two fields: name and display name. Anything else will break for someone. In most cases where something asks for my name its really not even necessary and certainly not necessary to split it into first/last/display/friendly etc.
This is not sufficient if you’re going to localize to languages other than English. In some languages, proper names get declined like other nouns and thus change spelling in different contexts.
A lot of software would be much better without stupid clients asking for the wrong things.
Eh, I understand it. They want to put some name in the identity control in the header. Putting the full name in is guaranteed to go wrong. We might start asking for a nickname for that purpose.
You're assuming that you're 'friendly' enough with your users that they want you to use their given name.
Oh yeah that touches a nerve. I hate it when I have filled in my name and email somewhere, maybe I forgot doing it, and I receive a newsletter, which opens with and greets me with just my first name.

No. Just because I signed up for your mailinglist doesn't mean we're on a first name basis! I hardly even remember your business exists, do you remember me? Anything else about me except my email and my full name, from which you deduced my first name? Then let's not pretend we're buddies.

This of course depends on how "friendly" I'm willing to be with said business. Which differs if it's an Etsy store, ordering food online, my bank, insurance, etc. I especially hate it when the news letter is in fact 99% ads and promo babble, but has this 1% of useful info that I want to be kept up to date on. We're not close, I'm letting you spam my inbox, call me "your grace" or something.

Can you actually go wrong with just using someone's full name, and erring on the side of being a tad too formal? Is this just a problem with marketing companies that want to "connect" and become "buddies"?

This already breaks names which inflect.
Which isn’t really a problem you can solve. If names change in the context that they’re used, it will always be broken, so why break more names by trying to be clever?
I'm not saying you can solve it. A single text field isn't a solution either. You cannot avoid breaking some names.
Actually I think it should be ok, because you’d enter the name in the nomative case and then when you write it on the screen you’d declinate it based on the language you’re displaying it in, which would be the same for every name regardless of its origin.
What’s this? A name that changes when you are talking to someone directly? In some Slavic languages this happens.
Usually the name would change according to the full rules of noun inflection in whatever language. In Latin, a noun has 6 cases, of which vocative (indicating direct address to the noun) is one.
Because it’s extremely diverse between cultures and countries how names work. Here in Germany it’s typical to have several first names and it’s legal to use any of it, even though it might be just the name of your godfather/godmother.

FHIR has a relatively general definition for names, but multiple general and country-specific extensions exist for it: https://www.hl7.org/fhir/datatypes.html#HumanName

It's a string. Dont even try anything else.
Exactly this, and I really don't see what the fuss is all about in the above comments.
The idea that a person's name should be parsed and managed by software is amusing to me. How about just getting rid of concepts like "last name" and "first name" (which already embed a lot of cultural assumptions), and only ask for a "full name"? In some countries people don't have both first and last names. In some countries the last name customarily comes before the first name. In some countries the structure of names is more complicated and the son's name includes a copy of his father's name. I don't think software will really handle all these oddities correctly, given that just a single parent can undermine all the system's rules by choosing an unconventional name for their children.

For what it's worth, in Singapore, where there are significant Indian, Chinese, Malay ethnicities but also highly westernized, the government identity card provides just a single full name. Parents can choose their children's names in accordance with their culture—or not. You can put your first name before your last name, after it, or surrounding it. Or include your father's name if needed.

By ignoring the structure in data acquisition phase, you just postpone decisions about structure to data processing phase, now without necessary information about the structure (which could be obtained in data acquisition phase).

For example, such basic functionality like changing sort order between given name and surname would be much more complicated.

Besides, the whole story is about the problem that name including title was in one field and later processing (title removal) misbehaves due to insufficient information.
Is it important to parse names like that? You can just do a search by substring in most cases
It sounds like all you really need to handle names reliably is to ask for the entire name in one field, then have another field for their preferred name (which could be the first name, or the middle name, and a diminutive). And if you need to do something more formal with a title (like Mrs Lastname), potentially have that as a third field.

Sometimes the dumb solution is better than trying to be clever, and it saves some trouble with localisation.

W3C recommends the same pattern: [0].

[0]: https://www.w3.org/International/questions/qa-personal-names

> Sometimes the dumb solution is better than trying to be clever

It's astonishing how often this turns out to be true, which has been probably the single most important lesson of my career. I think it's that clever solutions tend to depend on more assumptions, which rarely have P(true) = 1.

"First" and "last" is a wrong representation anyway, because it assumes people always write their names in that order.

That breaks for chinese, japanese, korean, and probably multiple other types of names.

s/first name/given name

s/last name/surname

But this has it own issues, like assuming people have either a given name or a surname in the first place.

Also hard to decide which is the surname to use with some names/cultures.

My native country, Norway, went through an assimilation period of standardising surnames a few hundred years ago. Before that your name often was in 3 parts:

First names(s) - father's name - farm/manor/village.

So names were something like "Ivar Ragnarsson of Torp" or "Sverre Haraldson Bjerkeli". (With the -son bit to say whether a son or daughter).

With assimilation into standard more Continental Christian Danish society and most likely standard registration for tax - people dropped either the farm name or the father's name in their names. And froze the father's name in the surname in future generations. And changed the -son to a more Danish -sen for all genders. So, since the 1700s people have just 2 parts to their names. Unlike Iceland which has kept the naming tradition.

However,... what is common again today is to have 2 surnames. One from each parent. Unhyphenated. Similar to the Spanish convention (first-name - father's surname - mother's surname) but not as standardised, and mostly opposite order with father's surname at the end being the official family surname. And that makes internationalised computer systems so complicated.

My children have both our surnames, both by choice and necessity so either of us can get through passport control with them. (mother's surname - father's surname). But they had to have their surnames hyphenated to be able to register their births and British passports. Which still angers me today as my family convention of the latter surname being the main one is now mostly ignored.

How does that work after 2 generations? Wouldn’t you end up with names like this, and longer afterwards? Which ones are carried on?

Bob Jones Alexander Richardson Hill

People are free to choose what they want but you mainly keep only the "main" surname from each parent.

I know in England there was a tendency of people keeping both names of powerful families[1][2], then as double-barreled surnames. Which then sometimes went a bit nuts a few generations later if they married into other double-barreled families [3].

I think it was when I visited Stowe School, the seat of the Dukes of Buckingham and looked at the family tree, that I even saw some surnames repeated if they married into other families which shared one of their multi-barreled surnames...

[1] https://www.theguardian.com/lifeandstyle/2017/nov/02/keeping...

[2] https://www.telegraph.co.uk/family/parenting/are-we-heading-...

[3] https://en.wikipedia.org/wiki/Richard_Temple-Nugent-Brydges-...

I don't know about the OP, but in Quebec this is fairly common. Usually you have 2 surnames until you turn 18 and then you choose one of them. I really like the idea of this, personally.
Myanmar is another one, almost everyone has just one given name unless they are specifically following the western or some other style (and that's rare in my experience). The last name field should always be optional at a minimum.
> "First" and "last" is a wrong representation anyway, because it assumes people always write their names in that order.

I do not see this as a problem. If i know English so i can fill english-labeled form on english webpage, i would also have a bit of cultural knowledge to translate first name to given name and last name to surname.

> But this has it own issues, like assuming people have either a given name or a surname in the first place.

This is only a problem if the form validates that both fields must be non-NULL. Problem is not with the split itself but with the validation code.

I know a Romanian online and his surname comes first... It’s a historic IT clusterfuck to assume all names are firstname, lastname
Worked on some business software previously, and customers insisted on first/last or first/middle/last, despite the fairly obvious issues. They also demanded address fields in a US style despite needing to support international addresses (I still have no idea how their staff handled that).

People want to follow the conventions they know, even apparently if they're told it will cause issues.

Yet in reality, nobody actually needs a name that's sorted by surname: that's a holdover from paper phone books. We have search, and we have stable sorting algos. Every requirement "sort by surname" I've ever seen turned out to mean "sort the names in a predictable way, btw this is the way we always did that, because we always did that."

(Yes, familiarity is a part of UX; but do note that this one specifically is a historical, not intrinsic, motivation)

We used to have problems with non-ASCII chars in names, we fixed that with UTF, we had problems with currencies and numbers, we made libraries that understand locales and even directions of writing, time zones same thing. So it’s time for us to resolve names now with standard libraries that have been thought through like the above.
While I'd probably run in to edge cases, it would be nice to actually point to a standard and say "the libraries all support standard XYZ. that's built in - doing it any other way is going to mean problems ABC and cost $$".
I worked at a place that required a middle name. I don't have one, so at their instance, I picked a name, "Xavier" became my middle name.
Credit cards forms handle this perfectly. There is just a name field. It seems odd that this is perfectly acceptable for financial companies that in essence loan billions a year but its not ok for anyone else.