Hacker News new | ask | show | jobs
by bartread 2407 days ago
It's actually interesting that frameworks provided by platforms such as .NET, Java, etc., don't include an abstraction for the representation of names.

Such abstractions exist for dates, times, calendars, currencies, calculations with money, and so forth, but not names.

On the one hand I can understand it, because names are so complicated, and how would you sit down and come up with something good enough to represent all of them?

On the other hand they're prevalent in such a high percentage of line of business and consumer facing apps that it's almost ridiculous that every single developer on the face of the earth at one time or another has to come up with their own half-baked implementation.

It's especially ridiculous when you consider that so many of these home-rolled implementations, if not all of them, are rife with terrible flaws that constantly cause frustration and inconvenience to a small but significant number of users.

5 comments

This is a solved problem from a modeling standpoint. The HL7 Reference Information Model allows any entity (such as a person) to have multiple names. Each name can be tagged with a type (legal, maiden, alias, etc) and validity date range. A name can contain multiple parts in any order, optionally tagged as prefix / suffix / family / given. Names can also be explicitly marked as null if unknown or not assigned. There are open source RIM implementations in several languages.
Interesting; I didn't think I'd see an HL7 reference in this thread. I work somewhat with FHIR, which also has a HumanName[1] data type, and I think it handles most of the cases in this thread.

For those not familiar: FHIR is a standard that covers health and patient data. IMO, it's a pretty good model. (HL7 is the organization, and there are few other standards under it.)

I'm less familiar with RIM; could you link to it's definition of a name? (The best I could find suggested that it was nothing more than an unconstrained piece of text.)

[1]: https://www.hl7.org/fhir/datatypes.html#humanname

Unfortunately due to the way the HL7 web site is structured there's no way to give a direct link. Go to the Normative Edition, Foundation, Data Types, Basic Types.

http://www.hl7.org/implement/standards/product_brief.cfm?pro...

The FHIR data model is a little simpler to allow for easier implementation. In the vast majority of real world healthcare use cases it works well. But from a modeling standpoint if you need to cover odd edge cases it sometimes helps to look back at the old RIM.

What's the purpose of structured modelling of names? Why does it matter which parts are family vs given vs whatever else? Lots of the W3C International Examples they give use a `<text>` field. Why not just use that?

https://www.hl7.org/fhir/datatypes-examples.html#2.24.1.13.1

It matters for collation order, if you want to show a list of patients sorted by family name.
Is this a current use case, or did that just make sense 100 years ago? (As in these assumptions: "tracking actual relationships in data is Hard, and family name correlates strongly with real-world relationships") I mean, it sort of works even today, but neither of the assumptions are as strong any more.
It's still a current use case for healthcare provider workflows. Also helps a lot when doing automated record linkage between different systems.
A dedicated, well-maintained name abstraction is certainly something that needs to happen. More than interesting, it's a bit bizarre that this hasn't been done yet (AFAIK.)

In terms of developer-facing complexity, this could be a laughably simple thing to use- just a type that supports equality, perhaps ordering, and conversion to string. Only the constructor would need to be complex. :)

I guess the reason this hasn't been done is simply that the implementation would never be "correct"- there is no formal specification of human names out there, and there would always be cases where some poor individual with an unusual name falls afoul of the system. Strictly fewer cases than we have today, where everyone rolls their own name system, but still some; it's not a solvable problem the way timezone conversion is.

But, on the other hand, that's the way the world is- messy. Developers are going to have to learn better best practices for communally doing our best in cases where there is no perfect answer, because there's only going to be more of such cases as tech continues to eat the world.

I actually attempted to do this.

I used ML techniques to help smooth over some of the difficult parts (there are many difficult parts). The hardest cases are ambiguous names, for instance delineating Hispanic vs. Puerto Rican naming conventions (they're different). The fundamental approach involved pushing all ambiguity up to the end user, so they always have the option to correct the system.

https://www.alphanym.com/demo/?jm2

I’m pretty sure it’s solvable, the main problem is that we break up names into first and last to identify the parts and we do bad data quality checks. Let’s say we did just have one field and a service that was trained on each counties’s variations that could return the parts of the name you wanted. So some database and detection system to understand the pattern. It’s definitely possible since we humans do read and understand names just fine in our own locales.
I'm not convinced it is solvable. I don't think the general case is reliably solved even by humans.

E.g. is Carlos the same person as Karl?

Well, that depends. Was one of them localized, or are these the actual given names. Just this weekend I was on an offsite and saw a Spanish book about Karl Marx, or Carlos Marx as they had written it in the title, in the library of the house we stayed in.

Clearly in this instance the names are the same, but that requires knowing that Carlos Marx maps to Karl Marx and that Karl Marx is a famous name; otherwise you can't assume the name was translated.

There were many other names on books in that library. I don't know which one of them - if any - also maps to someone known under another name, because that requires me to know which person they are about.

Is Curt, Curth, Kurt the same person? My uncle had all three on different documents, and delighted in telling people about it.

What about countries like the UK, where there is no legal requirement to notify anyone of a change of name, and where a the legal way of formally changing your name - a "deed poll" is just a document structured in a certain way where you assert that you are known under a certain name? My ex is known under at least three different name combinations, all of which are present on different sets of legal documents.

Some subsets of the issue is solvable, but for example there is no way of taking a full name and returning the "name this person prefers to be known by" because the name does not contain that information. You can make a pretty good guess.

But you'll fail dramatically for people from different countries. And don't think for a second you can guess correctly based on where a name is from - many names are used in different countries, and often as different elements (e.g. firstname one country, lastname another; feminine name one place, masculine another), and many people have names that combine different nationalities (e.g. my son has a name that combined an English firstname, a Nigerian middle name and a Norwegian last name).

The only reliable solution is to not assume any one single string can be used as a generic name - you need to ask what to use within a given context and within whatever constraints you have.

It would help if he quit using the First|Middle|Last| terminology an used Surname(if-any)|Personal-name #N|Personal-name #N+1(ifany)...
It would help if we quit using that and had one field for the full name and then another one for "how do you want to be addressed".
I've in the past written stuff that generated an index of names; this was sorted. While you can certainly sort on a free-form text field, the culture here is that name indexes are generally sorted by family name. So, to do that, you have to have some understanding of what the family name is, which a free-form text field does not give you.

But a lot of software has no need for the breakdown, and would be better served by a free-form field.

Then at least be upfront with the user about why you're asking, because they might well answer differently (e.g. include a different number of parts of their last name) if they know your purpose is to sort the name than if they think it's being used for a different purpose. They might even give totally different names.

That's the most important part of the comment above: The concept of a name is so overloaded that unless you ask about the string to use for the specific purposes you intend to use it, then there is very little you can do with it.

We already have an abstraction for names: a text field. Trying to be more clever than that will break for someones name.
Which is great until the client asks for a friendly name (given name) identifier. GitHub uses name, we use first/last name. So we just shove the GitHub name in the first name spot and ask people to organize what looks right.

Our partners suggested string.split(' '), which produced interesting results against the sample list of github users.

Use two fields: name and display name. Anything else will break for someone. In most cases where something asks for my name its really not even necessary and certainly not necessary to split it into first/last/display/friendly etc.
This is not sufficient if you’re going to localize to languages other than English. In some languages, proper names get declined like other nouns and thus change spelling in different contexts.
Are there examples where splitting it in First/Last name (or any other split) would help with that? It seems like that would always either be a problem or something to design around while localizing.
It’s fundamentally language specific, so any comprehensive solution is going to need to interface with the localization system. Really, names should be keyed by (user, language, tag) triples, where the localization defines the acceptable tags based on language requirements. For example, a single person may need their name stored as:

  en:disp     Eric
  is:disp:nf  Eiríkur
  is:disp:þf  Eirík
  is:disp:þgf Eiríki
  is:disp:ef  Eiríks
Designing a UI to collect this information is left as an exercise for the reader.
A lot of software would be much better without stupid clients asking for the wrong things.
Eh, I understand it. They want to put some name in the identity control in the header. Putting the full name in is guaranteed to go wrong. We might start asking for a nickname for that purpose.
Why is the full name guaranteed to go wrong? A better assumption is that there is no reasonable nickname and you have to use their full name.
You're assuming that you're 'friendly' enough with your users that they want you to use their given name.
Oh yeah that touches a nerve. I hate it when I have filled in my name and email somewhere, maybe I forgot doing it, and I receive a newsletter, which opens with and greets me with just my first name.

No. Just because I signed up for your mailinglist doesn't mean we're on a first name basis! I hardly even remember your business exists, do you remember me? Anything else about me except my email and my full name, from which you deduced my first name? Then let's not pretend we're buddies.

This of course depends on how "friendly" I'm willing to be with said business. Which differs if it's an Etsy store, ordering food online, my bank, insurance, etc. I especially hate it when the news letter is in fact 99% ads and promo babble, but has this 1% of useful info that I want to be kept up to date on. We're not close, I'm letting you spam my inbox, call me "your grace" or something.

Can you actually go wrong with just using someone's full name, and erring on the side of being a tad too formal? Is this just a problem with marketing companies that want to "connect" and become "buddies"?

This already breaks names which inflect.
Which isn’t really a problem you can solve. If names change in the context that they’re used, it will always be broken, so why break more names by trying to be clever?
I'm not saying you can solve it. A single text field isn't a solution either. You cannot avoid breaking some names.
A single field breaks fewer names than forced first and last names, though, and is a simpler implementation too. Plus, as long as you accept any input (besides blank, I guess), then the only way it will be broken is during display and at least the user sees the exact name they typed in, exactly how they typed it in.
A single field also breaks the expectation that people do not get called by their full names in every interaction. This is a very common expectation, and violating it makes you sound subtly more like an evil robot.

Is this a lesser offense than mangling a name that doesn't cleanly split into first/last? At the individual scale, probably.

The impact, in aggregate, on UX/sales/utility? Could definitely go either way depending on your userbase.

Actually I think it should be ok, because you’d enter the name in the nomative case and then when you write it on the screen you’d declinate it based on the language you’re displaying it in, which would be the same for every name regardless of its origin.
Tried that once. Horrible, terrible, no good idea. (The only rule you can be sure of is "there are countless exceptions, and exceptions from the exceptions", everything else is a minefield in a quicksand) Asking for "how should we address you" is far easier, even if a few users fill in "Your Galactic Imperial Majesty".
What’s this? A name that changes when you are talking to someone directly? In some Slavic languages this happens.
Usually the name would change according to the full rules of noun inflection in whatever language. In Latin, a noun has 6 cases, of which vocative (indicating direct address to the noun) is one.
Irish has a vocative case that can modify names, and is an official language of a UN member state.
Because it’s extremely diverse between cultures and countries how names work. Here in Germany it’s typical to have several first names and it’s legal to use any of it, even though it might be just the name of your godfather/godmother.

FHIR has a relatively general definition for names, but multiple general and country-specific extensions exist for it: https://www.hl7.org/fhir/datatypes.html#HumanName

It's a string. Dont even try anything else.
Exactly this, and I really don't see what the fuss is all about in the above comments.