Hacker News new | ask | show | jobs
by trainfromkansas 2712 days ago
None of the differences, either split by company or by twin seem egregious or even particularly unsatisfactory. They cite that AncestryDNA measured each of their DNA to be 99.6% similar, and really, that seems like the error rate you would about expect for the Ancestry results they got. They got the exact same regions and only a percentage point or two off in each region's share.

It's not like they're fully sequencing every DNA sample they get.

> AncestryDNA found the twins have predominantly Eastern European ancestry (38 per cent for Carly and 39 per cent for Charlsie). But the results from MyHeritage trace the majority of their ancestry to the Balkans (60.6 per cent for Carly and 60.7 per cent for Charlsie).

This part of the article especially seems like hair splitting considering the Balkans and Eastern Europe tend to have a lot of overlap. In fact, "Balkans" in particular is an extremely ambiguous linguistic term and can mean so many different things to so many people.

4 comments

What's unsatisfactory is the way these companies present these results which implies a level of accuracy and precision which is unwarranted.

"She also has French and German ancestry (2.6 per cent) that her sister doesn't share."

I assume from this that 23andMe is reporting the results to this level of accuracy (a tenth of a percent). For the sister who got 2.6% they are strongly implying a certainty that she has a small amount of French and German ancestry rather than what actually seems to be the case which is a small chance of some French and German ancestry.

Since it seems their results are quite lacking in both precision and accuracy they should do a better job of reporting them and advertising them in a way that makes that clear.

I can only speak for myself, but when I got one of these tests 6 years ago, I never got such an impression. They've updated my results several times since and I never feel hoodwinked as if I had been previously deceived; I'm always just curious to see what the latest estimates are.

> I assume from this that 23andMe is reporting the results to this level of accuracy (a tenth of a percent). For the sister who got 2.6% they are strongly implying a certainty that she has a small amount of French and German ancestry rather than what actually seems to be the case which is a small chance of some French and German ancestry.

You picked out the single biggest discrepancy they report in this article, and yet even that doesn't seem to indicate an actual inconsistency to me. I would tend to interpret those results to mean that for one twin, they felt sufficiently confident to call a portion of their DNA "French and German", while in the other it fell below that confidence threshold so it got thrown into the "Broadly European" share.

This process already pretty much precludes objective perfection just based on what they're reporting. What does it mean for your ancestry to come 2.6% come from France? During what time period? What if it's on the border in one of those regions that switched hands several times? These are subjective decisions they have to make but it doesn't invalidate the whole test because occasionally you have enough evidence to call portion of your DNA "French and German" and sometimes you have to fallback to just calling it "Broadly European".

> This process already pretty much precludes objective perfection just based on what they're reporting. What does it mean for your ancestry to come 2.6% come from France? During what time period? What if it's on the border in one of those regions that switched hands several times? These are subjective decisions

I think that’s the whole point of the complaint here. These companies pretend to have found overly precise results (which I’m sure are explained away to nothing in the fine print) and yet are subjective and lean heavily on guesswork.

23andMe told me that I was a quarter Italian. Seemed to have mixed that up with what I actually am: 50% middle eastern. For a while I wondered if there was some funny business in the family, but then my sister got the "right" percentages. This seems like a pretty huge error to me.
I know it might can be assumed from the context of the post and thread but are you talking about your twin sister or not? Just pointing out that the chance for funny business (in your family or at your birth or at these companies) is unknown but not necessarily zero.
Well, Italian and ME populations were basically the same thing for more than a thousand years (basically until the Islamic Conquest promoted piracy on the Mediterranean, making it fundamentally unsafe). So it might well be that you are carrying genes that most of the Italian population also carries, and viceversa.
Is it possible one of you has a father that's 100% middle Eastern, and the other has a father 50% italian and 50% middle Eastern?
I would argue that this is why significant digits count.

You can't say something is 2.1374cm long when your error is +/-1cm. You can say it is 2cm long though.

Actually, you can. It's just a custom to do differently.

But if you estimate your first moment is 2.1374 and your second moment is 1, I want to know just that, because it is the numbers I'll plug in my equations. I will decide to round if and when necessary

You're implying that by rounding before handing off the numbers, valuable information is lost. Actually, the reverse is true - the rounding itself contains valuable information (how many digits are actually relevant) which is a hidden property of the process that obtained them. Unless you provide it through a side channel, failing to round hides this information, for no benefit at all (because the digits you provide are known to be meaningless).

The only time this wouldn't be true is if you mistrust the significant figure calculation of the source, and believe you have sufficient insight to do better.

Yes, I do. In general, I stick to simple things. Enough samples and everything is normal, god bless the law of large numbers.

If the precision of your estimation is not a direct function of the standard deviation, but is a "hidden property of the process that obtained" it, we have much bigger problems that losing "valuable information"

> If the precision of your estimation is not a direct function of the standard deviation, but is a "hidden property of the process that obtained" it

I think you're confusing different types of error. There is error between measurements and an inherent error to the device you use to measure. There's also a difference between precision and accuracy.

Standard deviation is the difference in multiple measurements. For example if you measure something 10 times to be 51mm, then your standard deviation is 0.

But that doesn't mean you have no error.

The "property of the process that obtained it" is not hidden. A simple case is a ruler. You have lines on the ruler that tell you certain intervals. If the smallest interval on your ruler is 1mm, then all your calculations can be made to +/- 1mm (that is, up to 30.5cm on a standard 12in ruler). There is nothing hidden about this. All that is being said here is that your measuring device is not perfect.

So using the two errors, we have a measurement of 51mm +/- 1mm (or frequently in a short hand you'd just say 51mm). It would in fact be deceptive to say that your measurement was 51.0mm, because that implies that you have more precision than you actually have (implying that you have on the order of +/- 0.1mm precision).

Back in high school in physics we would get minus points if we indicated a too high precision in the numbers we used for calculations, it was considered plain wrong to say 2.232cm if you actually only were able to measure that it's roughly 2cm.
I'm not sure why anyone thinks significant digits in any way informs error rate, and it's a little hyperbolic to suggest 4 significant digits with a whole number error rate, when they did 1 significant digit with a ~3% to ?% error rate.
>I'm not sure why anyone thinks significant digits in any way informs error rate

In the majority of technical fields the standard is to round the result so that the significant digits do correspond to error rate.

https://www.ruf.rice.edu/~bioslabs/tools/data_analysis/error...

I'd expect that in academia for the most part. But this is a consumer product, so they aren't going to follow technical fields, and my hunch is that's for marketing reasons (it looks more authentic).

But even then, rounding to the error rate is a standard practice, but it certainly does mean it's wrong or even unethical, it's just pointless not to. Unless of course your motivation is driven by marketing/sales.

> it looks more authentic

One might call that deception

> This part of the article especially seems like hair splitting considering the Balkans and Eastern Europe tend to have a lot of overlap. In fact, "Balkans" in particular is an extremely ambiguous linguistic term and can mean so many different things to so many people.

We don't have a good way to name the origins of populations. We use approximates based on nationalities which is probably a really, really bad proxy. Take France, where there is probably not a single part with the same kind of gene pool (germanic invasions from the North, gallic roots in places that were untouched, italian tribes from the south and I could go on) so "French & German" as a split means absolutely NOTHING.

It doesn't mean _nothing_ -- it's just prone to the same type of problems and edge cases affecting pretty much every attempt at categorizing in a very complex system.
To me it means _nothing_ when the end-class you use is very ambiguous. A classifier should be as specific/differentiated as possible or else, it's useless.
You mentioned in another comment that haplogroups would be too foreign a classification for you, but that would likely be the most specific/differentiated classifier.

If somebody says they're a "quarter French" -- putting aside whether that's interesting or not -- do you believe that's meaningless? Do you demand them to specify whether it was one of they have Gallic roots or if it was one of the Germanic tribes that invaded from the north 1500 years ago or something else altogether?

I think the point of this test is to try to capture the spirit of what Americans have done for a long time, which is to try to describe their heritage in fractions using nationality of a distant (but not too distant past) as a rough guide.

And in the same way that your genome might prove more complicated than naming fractions by nation, somebody might say "My grandfather came to the USA from France... but he was part of a second generation immigrant community from a Greek-speaking part of Sicily who emigrated en masse to Alsace, which was located in Germany, not France at the time".

So maybe the DNA test will call that French or German or Italian or Sicilian or Greek or whatever and that might oversimplify it and the test doesn't work as well as somebody whose ancestors stayed in one location and was part of a largely homogeneous population, but that doesn't render the test meaningless. Part of the excitement people derive is that it can sometimes illuminate your ancestral history beyond what is well known. Using the previous example, somebody might not be aware of the complexity of their grandfather's heritage, and might be interested to understand that part of the family has roots in Sicily and not just France.

> We don't have a good way to name the origins of populations.

https://en.m.wikipedia.org/wiki/Haplogroup ?

Interesting. But it's far from being a well known denomination (layman-wise).
Disclaimer: I'm not an expert on the subject.

I remember watching PBS DNA documentaries[1] where it's mentioned in one of the episodes that DNA-wise there's only 1% difference between any two humans.

I looked for the same details today on genomenewsnetwork[2], it has following line

> we are all 99.9 percent the same, DNA-wise. (By contrast, we are only about 99 percent the same as our closest relatives, chimpanzees.)

It seems like even an error of 1% would be actually be same as missing the whole difference between two people.

  [1] https://www.youtube.com/watch?v=apOP8MMedqE
  [2] http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp4_1.shtml
A lot of genomic tests only sequence the single base pairs that happen to contribute to that 0.01 percent variability. The article cites that only a subset of the genome was sequenced and that subset isn't chosen at random but rather highly targeted to get at variance in populations
These genomics services don't look at the parts of the genome that remain the same for everyone...
> In fact, "Balkans" in particular is an extremely ambiguous linguistic term and can mean so many different things to so many people.

Not really, Balkan countries are well-defined and have been so for over a century now. It's basically European countries that were under the Ottoman Empire:

Greece, Albania, North Macedonia, Bulgaria, Romania, Serbia, Kosovo, Montenegro, Croatia, Bosnia. One could also throw Slovenia in there but historically speaking, they have always been pretty different from other Balkan-folk.

"There is not universal agreement on the region’s components." [1]

"It can be difficult to define exactly which countries are included in the Balkan States. It is a name that has both geographic and political definitions, with some of the countries crossing what scholars consider the 'boundaries' of the Balkans." [2]

Almost the entire wikipedia article is dedicated to explaining the various definitions of it and how it is a problematic term. [3]

[1] https://www.britannica.com/place/Balkans

[2] https://www.thoughtco.com/where-are-the-balkan-states-407024...

[3] https://en.wikipedia.org/wiki/Balkans

I'm from Bosnia, which is in the Balkans by every definition of the term. As for the rest of our neighbors, it's a mixed bag.

According to certain definitions, only the coast of Croatia is in the Balkans (without Istria peninsula). Vojvodina (the northern region of Serbia) should definitely not be in the Balkans. Slovenia definitely not. Romania? I can't tell without looking up the definitions.

See the problem?

Just for the sake of accuracy - Croatia was never "under" the Ottoman Empire. That was Herzegovina - so technically today's Bosnia.

Also not sure what you mean by Slovenia being historically different - slavs, similar language, part of austro-hungarian empire, part of Yugoslavia, etc etc