Hacker News new | ask | show | jobs
by BKPetkov 3882 days ago
How long does it take (and with what computational bandwidth) to produce a 125MB variant file from 200GB raw sequence data?
2 comments

Depending on the pipeline you use and the compute resources available you could have a full workflow done in anywhere from several hours to a couple days. Illumina BaseSpace is free (for now) and has some example data sets with a bunch of canned pipelines for analysis if you're interested in trying it for yourself. https://basespace.illumina.com/
You're not going to VCF on a whole genome in several hours.
With particular hardware and software you can. Edico Dragen claims speeds for bcl -> vcf of 20 minutes [1]. With Microsoft Research's snap aligner and 450GB of memory you can get whole genome alignment in ~30 minutes and then variant calling can be done in a couple hours.

1. http://www.edicogenome.com/dragen/dragen-gp/

Could you please elaborate on this?
I've seen 200GB runs take 4 days, I've seen runs take 3 hours. Depends on your computing structure but more importantly is your IO. High CPU core counts and high speed storage access make a big difference, as does distributing the computational workload.