Hacker News new | ask | show | jobs
by bloak 1048 days ago
At some point you might want to consider "pivoting" to a gigantic database of all dead humans: there would be fewer data protection and privacy issues!

I have sometimes wanted to trace people's ancestry, or, more often, the descendents (of the parents) of a person who died half a century ago. It's depressing how difficult it is to look these things up. Various companies try to sell access to public records, but I don't do this often enough to be interested in paying for a subscription.

With something like this you should really also publish exactly where the information came from. There's a big difference between "an anonymous contributor supplied this" and "this comes from a database that we downloaded from whatever.gov.uk on this date and here's a copy of that database in case you want to check".

Some things that almost everyone is already aware of but I'll mention them anyway:

* The concepts of "first name" and "last name" only apply to some cultures.

* Most people have more than one name: women who change their name when they get married, middle names that may or may not get mentioned, names that are frequently abbreviated ("Kate" might be "Kate" or "Catherine" or ...), punctuation and diacritics that may be modified or omitted, Macdonald/McDonald/Mac Donald/..., various ways of transcribing the same name from a different alphabet, ...

1 comments

Thank you for your comments. We're well aware of the pitfalls you're pointing. Some of them can be avoided now, others will need a dose of AI down the road. For now, we log the data when we find it. Mentioning the sources is a tricky issue. Our philosophy is to say as little or as much about everyone. Barack Obama's record is no more developed than yours. Linking a record to Wikipedia or to the list of Minnesota's sex offenders would break that rule, and not in a good way in my opinion.
I didn't immediately understand the point you're making there because I don't think you'd ever need to use Wikipedia or a list of sex offenders as a source, but I think I see your point now: if for 98% of people the specified source is a government register of births then anyone who doesn't have that source mentioned will stick out and an astute reader will immediately infer that they were born in a place where the register of births is not easily accessible or they have changed their name or something like that. So mentioning the sources is, as you say, a tricky issue.