Hacker News new | ask | show | jobs
by awenger 1536 days ago
Complete here means the full end-to-end sequence of all chromosomes in a single human cell line named CHM13. The typical human cell has 46 chromosomes, in 23 pairs (one from our mother, one from our father) named chromosome 1, chromosome 2, and so on. This CHM13 cell line is special is that each of its pairs is (nearly) identical. Each chromosome is a long string of A,C,G,T nucleotides. So, this complete genome is a full set of 23 sequences without any "not sure" positions or "gaps" in the sequence.

One common analogy is to consider the genome sequence (a.k.a. assembly) as a map. Since the initial publication of the human genome in the early 2000s, most regions of human DNA has been known in full resolution. Other portions, most prominently the repetitive centromeres that lie at the middle of chromosomes, have remained unmapped. It was known that they exist, approximately how big they were, and which types of sequences lay inside, but the full order of the sequence had never been determined for any human genome until this work.

You could consider the genome like the earth and the centromeres like a dense rainforest. Previously we had detailed maps of most of the earth, and we had mapped the boundaries of the rainforest and had satellite-level images (i.e. we knew they were full of plants). Now we have on-the-ground pictures with full detail.

Having a map of these sequences makes the accessible to study. One of the most valuable uses of the human genome is as a shared coordinate system used by scientists to compare different individuals and identify and name genetic variants that explain human traits. We lacked that coordinate system for a big chunk of the genome until now.

As you say, this paper reports the sequence of a single human cell line named CHM13. Each of us has a slightly different genome sequence (really two of them, one from each parent). Now when scientists sequence the genomes of more individuals, they can look at these regions that were previously ignored. Certainly understanding those regions will improve our understanding of human biology. Exactly how much will remain to be seen.

3 comments

Well not quite: There is still a lot of ambiguity and compression in centromeres. But I agree that we are almost there.

So, this complete genome is a full set of 23 sequences without any "not sure" positions or "gaps" in the sequence.

What's a cell line, and do we know anything about who CHM13 is?
chm13 is from a "complete hydatidiform mole" https://en.wikipedia.org/wiki/Molar_pregnancy and the paper says "Local ancestry analysis shows that most of the CHM13 genome is of European origin, including regions of Neanderthal introgression, with some predicted admixture" and fig 1 shows a cool breakdown of the regions of the genome with different ancestries
Seems to be an immortalized (telomerase*-transformed) cell line from a female fetus with near-complete homozygosity (https://sites.google.com/ucsc.edu/t2tworkinggroup/chm13-cell...).

* Telomerase is a reverse transcriptase that allows to achieve replicative immortality (https://academic.oup.com/hmg/article/9/3/403/715108).

> The typical human cell has 46 chromosomes, in 23 pairs

Mitochondria have their own DNA, which is also sequenced.