Hacker News new | ask | show | jobs
by Ericson2314 3677 days ago
The genome percentages in the article are a bit confusing. It is oft-quoted that we are "99% Chimpanzee", so how could we also be no more than "6% Denisovan"? Answer: they really mean 6 percentage points "realigned" vs the African reference populations. I've read many other articles repeating the "6%" and it would be nice if they (and this) were clearer (and used the correct units--but maybe that's too much to ask).

Some cool charts for reference: http://www.scientificamerican.com/article/tiny-genetic-diffe...

1 comments

The 99% is basically wrong. What it is referring to is that for 99% of genes you can find in humans you can find an ortholog in chimps. It tells you nothing about the function of the genes in the two species and is basically meaningless from a functional perspective.
I'm no geneticist, but isn't this "structural" number still valuable for other purposes? Also isn't the 6pp also "structural" so modulo my previous complaints we're comparing like measurements?
Yes it can help you work out how related two species are to each other, but as a means of knowing what an individual gene does in each species it can be highly misleading. Ask any pharmaceutical researcher how much the concept that sharing a gene means sharing a function can lead you astray.
Out of curiosity, how do you take the modulo of an (English) argument?
https://en.m.wikipedia.org/wiki/Modulo_(jargon) in ignoring the differences I already pointed out ("my previous complaints") I am defining an equivalence relation.
Cool, thanks
Does this means that if there are 3 genes a,b,c in mammal 1 and a,b in mammal 2, the lack of c can lead to a significant difference in the role/interpretation/function of a and b ?
That's certainly a possibility, but you don't even need to get that complicated. Gene A in humans could have a slight difference in its DNA sequence that causes it do work differently from the chimpanzees' copy of the same gene.
It gets even more complicated than this - the gene could be identical in sequence and still have a different function because of a change somewhere else in the genome. The genome is the ultimate spaghetti code that is the result of 3.5 billions years of hacks upon hacks upon hacks.
Different sequence but same gene ? what is a 'gene' then ? a position on the DNA thread ? a % of similarity ?
Let's put it in git terms. Humans and chimps are derived from a common ancestor, so you can imagine them as two diverging forks of the same codebase, with their common ancestor being the last commit that they both share. Since then, either or both could have performed any sequence of additions, deletions, edits, duplications, etc., including cutting large sections out of one file and pasting them into another. However, for chimps and humans, the fork was recent enough that the vast majority of lines of code have been untouched since the fork, so for the vast majority of lines of human code, you can identify the corresponding line of chimp code, and vice versa. So when we talk about "Gene A" in humans and "Gene A" in chimps, this is presuming that the region of the genome containing Gene A is relatively unchanged between the two species. However, as you and others have pointed out, just because the gene's sequence is identical or nearly identical in the two species doesn't guarantee that it performs the same function in the two species. You could make an analogy to modifying a global variable in a completely different file that drastically changes the behavior of a function, even though that function's source code is unchanged.

So roughly speaking, a gene is a region of DNA that operates as a functional unit. The most well well-understood function is encoding a protein product and regulating its production. And when we talk about the "same" gene in different species, we're using "same" informally to refer to the genes in the two species that are derived from the same gene in their common ancestor. Usually, but not always, these genes perform the same or similar functions in the two species. Unlike a function in computer code, however, the bounds of a gene are not well-defined, and can overlap other genes or be non-contiguous.

For more: https://en.wikipedia.org/wiki/Homology_%28biology%29#Sequenc...

I think so. That + epigenetics, and the "one gene one protein" slogan one learns in grade school aggregiously plays down the complexity, even if it's not false per se.
I didn't study biology so my 'knowledge' is limited to the approximations fed before. But as a programmer, I can imagine that a change at lower layers can impacts upper layers tremendously. I'd like to read more about gene interrelation / dynamics.