| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway487548 2767 days ago
	And the rate of errors is..? The same genome sequenced two times - how many differences between two sequences? Please, don't tell me it is none.

5 comments

AlexCoventry 2767 days ago

"99,7% SNP precision and sensitivity." [0]

Sensitivity = true-positive-rate = 0.997.

Precision = 0.997 = #true-positives / (#true-positives + #false-positives) = true-positive-rate / (true-positive-rate + false-positive-rate) = 0.997 => true-positive-rate + false-positive-rate = 1 => false-positive-rate = 0.003. [1]

That seems like a very high error rate, about 10 million errors in the three-gigabase genome, and 100 thousand errors in the 30-megabase exome (protein-coding regions.) That might be an acceptable rate for population-level analysis if the errors are sufficiently uncorrelated, but I wouldn't want to be making decisions on the basis of it for personalized medicine. For comparison, here's a rough estimate that an individual human genome has 2-3 million SNPs [2].

I thought you could do better than that with 30x coverage, so I might be misinterpreting them, somehow. Or maybe they're using an unconventional sequencing technology which is cheaper but less accurate.

[0] https://us.dantelabs.com/products/whole-genome-sequencing-wg...

[1] Equations given here: https://en.wikipedia.org/wiki/Sensitivity_and_specificity

[2] https://biology.stackexchange.com/a/51315/37343

link

gravelc 2767 days ago

Has anyone ever claimed it's none?

There's no simple answer to your question as it depends on many things - sequencing technology used, library prep and coverage to name a few.

Generally, it's not far from none when aligning short reads to a high-quality reference genome. Provided there's sufficient coverage and a majority of reads covering a particular nucleotide don't have a error at that position, than the correct answer will be given. Errors creep in due to things like systemic errors in library prep (such as a PCR error), and very low coverage over particular loci due to weird AT/GC content, meaning errors are harder to correct for. Repetitive regions can cause issues for short read alignment too, but coding regions generally aren't that repetitive.

$200 is very cheap for WGS - guessing it would be at the low end of the accuracy range, as they can't be sequencing to great depth (presumably).

link

klmr 2766 days ago

They sequence at 30x using BGI technology. Meaning: They provide the current offer at a loss.

link

heartles 2767 days ago

The sample file they provide lists 12 errors, so likely at least that.

https://s3.amazonaws.com/dantelabswebsite/Dante+Labs+Genome+...

EDIT: Unless I'm reading the results wrong

link

legulere 2767 days ago

The genes don’t even have to be the same when you have two samples from a person: https://www.sciencedaily.com/releases/2009/07/090715131449.h...

link

jghn 2767 days ago

The sequencers they’re using are arguably of lower quality than the gold standard

link

mnw21cam 2767 days ago

In the genetics world, when you say the words "Gold standard", that usually translates to "Sanger sequencing", which is a high accuracy method of sequencing a small section of DNA, like a single gene. I don't think your statement is very helpful in that context.

Most of the world's whole genome sequencing is done using the Illumina platform. This service is using the BGI platform, which is arguably higher quality than Illumina. Our lab has data showing the error rate with BGI is about 1/6 the error rate of Illumina.

Yes, there are some even better sequencing technologies out there, such as PacBio, which provides longer reads capable of sequencing slightly more of the genome, and the error rates are constantly improving. However, these technologies are much more expensive.

link