Hacker News new | ask | show | jobs
by maxander 2417 days ago
A dedicated, well-maintained name abstraction is certainly something that needs to happen. More than interesting, it's a bit bizarre that this hasn't been done yet (AFAIK.)

In terms of developer-facing complexity, this could be a laughably simple thing to use- just a type that supports equality, perhaps ordering, and conversion to string. Only the constructor would need to be complex. :)

I guess the reason this hasn't been done is simply that the implementation would never be "correct"- there is no formal specification of human names out there, and there would always be cases where some poor individual with an unusual name falls afoul of the system. Strictly fewer cases than we have today, where everyone rolls their own name system, but still some; it's not a solvable problem the way timezone conversion is.

But, on the other hand, that's the way the world is- messy. Developers are going to have to learn better best practices for communally doing our best in cases where there is no perfect answer, because there's only going to be more of such cases as tech continues to eat the world.

2 comments

I actually attempted to do this.

I used ML techniques to help smooth over some of the difficult parts (there are many difficult parts). The hardest cases are ambiguous names, for instance delineating Hispanic vs. Puerto Rican naming conventions (they're different). The fundamental approach involved pushing all ambiguity up to the end user, so they always have the option to correct the system.

https://www.alphanym.com/demo/?jm2

I’m pretty sure it’s solvable, the main problem is that we break up names into first and last to identify the parts and we do bad data quality checks. Let’s say we did just have one field and a service that was trained on each counties’s variations that could return the parts of the name you wanted. So some database and detection system to understand the pattern. It’s definitely possible since we humans do read and understand names just fine in our own locales.
I'm not convinced it is solvable. I don't think the general case is reliably solved even by humans.

E.g. is Carlos the same person as Karl?

Well, that depends. Was one of them localized, or are these the actual given names. Just this weekend I was on an offsite and saw a Spanish book about Karl Marx, or Carlos Marx as they had written it in the title, in the library of the house we stayed in.

Clearly in this instance the names are the same, but that requires knowing that Carlos Marx maps to Karl Marx and that Karl Marx is a famous name; otherwise you can't assume the name was translated.

There were many other names on books in that library. I don't know which one of them - if any - also maps to someone known under another name, because that requires me to know which person they are about.

Is Curt, Curth, Kurt the same person? My uncle had all three on different documents, and delighted in telling people about it.

What about countries like the UK, where there is no legal requirement to notify anyone of a change of name, and where a the legal way of formally changing your name - a "deed poll" is just a document structured in a certain way where you assert that you are known under a certain name? My ex is known under at least three different name combinations, all of which are present on different sets of legal documents.

Some subsets of the issue is solvable, but for example there is no way of taking a full name and returning the "name this person prefers to be known by" because the name does not contain that information. You can make a pretty good guess.

But you'll fail dramatically for people from different countries. And don't think for a second you can guess correctly based on where a name is from - many names are used in different countries, and often as different elements (e.g. firstname one country, lastname another; feminine name one place, masculine another), and many people have names that combine different nationalities (e.g. my son has a name that combined an English firstname, a Nigerian middle name and a Norwegian last name).

The only reliable solution is to not assume any one single string can be used as a generic name - you need to ask what to use within a given context and within whatever constraints you have.

It would help if he quit using the First|Middle|Last| terminology an used Surname(if-any)|Personal-name #N|Personal-name #N+1(ifany)...
It would help if we quit using that and had one field for the full name and then another one for "how do you want to be addressed".
I've in the past written stuff that generated an index of names; this was sorted. While you can certainly sort on a free-form text field, the culture here is that name indexes are generally sorted by family name. So, to do that, you have to have some understanding of what the family name is, which a free-form text field does not give you.

But a lot of software has no need for the breakdown, and would be better served by a free-form field.

Then at least be upfront with the user about why you're asking, because they might well answer differently (e.g. include a different number of parts of their last name) if they know your purpose is to sort the name than if they think it's being used for a different purpose. They might even give totally different names.

That's the most important part of the comment above: The concept of a name is so overloaded that unless you ask about the string to use for the specific purposes you intend to use it, then there is very little you can do with it.