Hacker News new | ask | show | jobs
by dcdanko 3112 days ago
Probably not much at all. SNPs and small indels tend to be have many neighbors with which they're highly correlated. If a variant caller missed a single SNP it's likely that it still called a bunch of others that nearly always co-occur. In most cases downstream association studies would be unaffected.

It's actually possible that DeepVariant is implicitly learning some of these correlations (1). This would make it really really bad for picking out the rare persons that don't fit a trend (and tend to be very important for identifying disease loci). GATK definitely does not know about correlated SNPs.

(1) The paper implies this is not the case, saying that DeepVariant works for other genomes without retraining, but they don't show the relevant results.