|
|
|
|
|
by yellowcake0
1028 days ago
|
|
It looks like from the preprint that they sequenced the Y chromosome of HG002, which was one of the original 1000 genomes samples from way back in the day, still held in deep freeze at a number of biobanks. Short-read sequencing data is a notoriously bad datatype for reconstructing the low-complexity / repetitive regions of genomes, so up until recently the most commonly used reference genomes have left many of these regions "dark". According to the preprint, the Y chromosome has the highest density of these low-complexity regions. It's also something of a bioinformatic nuisance when constructing a generic human reference genome, as it's only present in 50% of the population. |
|
I wouldn't call random data 'complex', but it is easy to sequence when assembling short reads.