Hacker News new | ask | show | jobs
by Swenrekcah 1790 days ago
From the paper [0] they say in the abstract that the differences are 1.5% - 7%. So the lower end of that lines up with the 98.x% similarity.

Then in the paper there is this paragraph:

> Our ARG strategy allows us to bin the human genome into regions containing archaic admixture in at least some humans, regions of ILS, and regions free of both archaic admixture and ILS in all humans (hereafter archaic “deserts”). We find that approximately 7% of the human autosomal genome is human-unique and free of both admixture and ILS. Roughly 50% of the human genome contains regions where one or more humans has archaic ancestry obtained through admixture. If deserts are further restricted to regions that contain a high-frequency, human-specific derived allele, i.e., a substitution that can be assigned to the human lineage (hereafter “human-specific regions”), then these comprise only 1.5% of the assayed genome (Fig. 4A).

Maybe someone here understands what these words mean and can clarify?

[0]: https://advances.sciencemag.org/content/7/29/eabc0776

1 comments

I asked the author (I was a postdoc at the lab he was a grad student in about 20 years ago).

My question: """Are these differences evaluated/inferred using data from all regions of the genome (intergenic, viral repeats, etc) or just genes? I recall that the early reports that compared primates to humans just used genes (or maybe just the easily aligned regions, but out of order) which seemed like a big omission."""

His answer: """We used the Simons Genome Diversity panel (full phased genomes for ~300 people), along with Neanderthal and Denisovan genomes to make an ancestral recombination graph (ARG). The ARG is a sequence of trees describing relationships between everyone all along the genome. It's really just a sequence of trees at each variable site. Then, you can look at these trees and find segments where the archaics fall outside the variation of the humans. These are regions where no human shares ancestry with archaic either by recent admixture or by incomplete-lineage sorting. Turns out that's about 7% of the genome. What's in that 7%? It's a lot of genes and specifically a lot of genes involved in neural development and neural function! The method itself is blind to what is genic or nongenic. But this method is about the genealogy of genes across the genome, i.e., from whom they were inherited and not necessarily how different the versions were. In other words, it's about the topology of the trees across the genome, not their branch lengths."""

Beyond that things start to get really complicated, you need to understand concepts like haptotype blocks, how new genes arise, etc.