| HN Mirror

What was announced 20 years ago was an incomplete assembly that met certain metrics that made sense at the time. https://www.nature.com/articles/35057062 is the paper. It describes the assembly as a "partial draft".

The section "Background to the Human Genome Project" gives some color on why they did what they did (TL;DR there was an ostensibly competitive race between the public project and a private one).

I ended up providing some useful resources for helping uncover just how bad genomic assemblies were (at the comptuational level): most genomic assemblies using whole genome shotgun sequencing used a number of heuristics which were believed to be correct, but I suspected that the heuristics failed to deal with repetitive regions and short sequences well. So I built a computing system with >1M xeon cores (Google Exacycle) and we provided the system to Gene Myers (who did the original WGS assembly for Celera). he used the system to do an all-vs-all comparison of sequence pairs, which found numerous bugs and problems with the heuristics that were being used. It was a huge amount of compute but the result was that myers was able to use PacBio data to assembly a significantly better genome, faster, on a laptop: (https://www.yuzuki.org/favorite-talk-agbt-2014-gene-myers-ma...)