Hacker News new | ask | show | jobs
by yellowcake0 1028 days ago
It looks like from the preprint that they sequenced the Y chromosome of HG002, which was one of the original 1000 genomes samples from way back in the day, still held in deep freeze at a number of biobanks.

Short-read sequencing data is a notoriously bad datatype for reconstructing the low-complexity / repetitive regions of genomes, so up until recently the most commonly used reference genomes have left many of these regions "dark". According to the preprint, the Y chromosome has the highest density of these low-complexity regions. It's also something of a bioinformatic nuisance when constructing a generic human reference genome, as it's only present in 50% of the population.

1 comments

Isn't the problem the absence of random DNA?

I wouldn't call random data 'complex', but it is easy to sequence when assembling short reads.