Hacker News new | ask | show | jobs
by comstock 3063 days ago
I’d agree with you, that long reads would be useful if the error rate wasn’t so shockingly bad.

There is, likely value in long reads, but what non-niche research applications are there for highly error’d reads that justify a valuation of several billion dollars?

3 comments

Virtually all applications can benefit from long reads. There are already hybrid assemblers out there which take Illumina, Pacbio and Nanopore reads. The long reads tie the short reads together, whereas the short reads improve the accuracy.

The area where DNA sequencing will first be revolutionizing clinical practice is in sequencing pathogens for sake of identification. In these instances nanopore sequencing rules, because it can give answers in minutes.

Most clinical applications don’t need long reads. Pathogen identification from short reads is easy. Blood tests for cancer, and NIPT (which will likely be the first big applications) both use fragmented DNA in the blood, so long reads are not useful. Depth (lots of sequencing) and quality are far more important.
It's worth noting that those clinical applications were developed when technology didn't allow long reads, so "clinical applications don't need long reads" is at present a truism. There may be potential applications that require long reads that simply couldn't have been invented yet (albeit I haven't the slightest what those would be.)
Yes, but I would say quality is most important in almost all cases. Well, quality being defined as <1% error rate, which isn’t such a high bar.

The most compelling near term applications (NITP etc) use fragmented DNA, and long reads will have no benefit here.

So, yes. Long reads are useful, but you need to have at least reasonable performance in other respects. The same thing has been seen with PacBio, who have not played well in the market, despite having a read length advantage.

How long does it take to get the answer? Even if a big, expensive short read sequencing machine is in the building, it still takes a day or two to reach the necessary data.

With sepsis, every hour counts.

The per base error rate is bad. In the case of pacbio, this error process approximates white noise, and so you can deal with it perfectly by increasing read coverage. Things are somewhat complicated with the nanopore tech described in this post, as errors may be correlated due to the way the basecalling is done, but in practice it's nearly as big a problem as you think it is.

For things approaching a read length the per-base error rate of a single read is simply irrelevant. In practice, with sufficient coverage (e.g. 20x) you simply don't care about the per base error rate of the reads.

That might be the case if the throughput wasn’t so low and the error rate wasn’t so high.
I feel like there is or was an assumption they'd be able to improve their tech