Hacker News new | ask | show | jobs
by zwischenzug 3248 days ago
Coincidentally, I was talking to my father about getting his family tree work 'stored' in some digital format for future generations.

It struck me that a simple directory structure with info about each person in each directory, text files with info in, and then tar'd up would be about the most future-proof format you could get.

Anyone else thought about this?

2 comments

That might be the "easiest", but it's not the most portable, and would be a nightmare to navigate. I'd recommend he work in a software package that can export to GEDCOM. That's a structured data format used by genealogists to track relationships and metadata. The current version is 21 years old, and the original spec was designed in 1984. It's robust and industry-standard. Every single genealogical product worth its salt will be able to work with that format (and export to it).

I hate to recommend them, but ancestry.com has one of the best interfaces in a SaaS product, with a ton of data on the back-end if you are doing research. An actual application (Family Tree Maker comes to mind, but don't quote me or take that as a recommendation) might be better for just "documenting" research.

I'm not just being flippant, but GEDCOM is a dinosaur. Can you imagine Facebook storing its relationship data in GEDCOM? Unthinkable. Genealogy most certainly needs a new standard.

Also, I know people who work for ancestry, and the general thinking is it's hard to update old interfaces for a newer generation when a very important public is 50+.

Oh it's a nasty format, to be sure, and more useful for "cold storage" than something you'd actively work in. But for a long-term storage requirement? It's reasonably human readable and imminently portable as the go-to standard.

You can import a GEDCOM into any family tree software and be up and running in minutes. A tarball'd directory structure would be more of a nightmare.

On Ancestry.com, the UI is fine, and still better than anything else I've seen. If you have any other ideas, I'd love to hear them. My issue with Ancestry is primarily their model that actively hinders collaboration between researchers unless you've paid their toll. They're taking your research and charging others for the privilege of seeing it. Given the open and helpful nature of genealogical researchers (see also: "search angels"), this rubs me the wrong way.

(So I use Ancestry for research and tree management—and I pay them $200/yr for access to their data sources—then I export (what else?) a GEDCOM, route it through a couple scripts of my own making, and publish it on my personal site.)

The GEDCOM file format is basically nested documents, and uses graph like pointers for linking things, so it is not that bad.

The actual data model is terrible though, as it was created by the church based on their 'traditional' style families.

However, most programs do not use GEDCOM to store their data (they might use an SQL database for example), but merely as a way to transfer data between programs.

Also note that there is always someone working to make a replacement for GEDCOM. Some of the latest efforts are GEDCOM-X (made by the church again), and the work by [FHISO](http://fhiso.org/). A lot of such projects, eg BetterGEDCOM, get bogged down in 'internal' politics, and rarely get anywhere.

Is it robust? I help my mom with genealogy stuff from time to time and the gedcom file we got from some cousins was, let us say, a mess.

I agree it is widely supported.

One thing to watch out for with export is whether you are working with data that doesn't make it into the gedcom file.

As a file format, sure, it's robust. Garbage-in-garbage-out, though. A directory structure would be just as messy, and harder to use.

And great point on export. Assets like pictures are an example: you'd still need to store them somewhere. (IIRC, GEDCOM format points to a file path of some sort.)

Edit: actually, you can embed multimedia (as a BLOB of some sort) directly into a GEDCOM, but I'd leave the files separate to maintain human readability.

You're thinking the same things I do.

I should have read your comment before posting my shameless self-promotion, but here am I posting my super simple, customizable and future-proof tool again: https://github.com/fiatjaf/rel

(read about it on my comment at https://news.ycombinator.com/item?id=14849823)