| > I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline: I've seen you link to your blog post a couple of times now, and I still think it's misleading. I do wonder whether your conflict of interest (selling competing software) has led you to come to a pretty unreasonable conclusion. (My conflict of interest is that I have a Broad affiliation, though I'm not a GATK developer.) In your blog post, you received output from 23andme. The GATK was part of the processing pipeline that they used. What you received from 23andme indicated that you had a loss of function indel in a gene. However, it turns out that upon re-analysis, that was not present in your genome; it was just present in the genome of someone else processed at the same time as you. Somehow, the conclusion that you draw is that the GATK should not be used in a clinical pipeline. This is hugely problematic: 1) It's not clear that there were any errors made by the GATK. Someone at 23andme said it was a GATK error, but the difference between "user error" and "software error" can be blurred for advantage. It's open source, so can someone demonstrate where this bug was fixed, if it ever existed? 2) Now let's assume that there was truly a bug. Is it not the job of the entity using the software to check it to ensure quality? An appropriate suite of test data would surely have caught this error yielding the wrong output. Wouldn't it be as fair, if not more so, to say that 23andme should not be used for clinical purposes since they don't do a good job of paying attention to their output? Your blog post shows, for sure, a failure at 23andme. Depending on whether the erroneous output was purely due to 23andme or if the GATK had a bug in production code, your post shows an interesting system failure: an alignment of mistakes at 23andme and in the GATK. But I really don't think it remotely supports the argument that the GATK is unsuitable for use in a clinical sequencing pipeline. |
On your second point. 23andMe had every incentive to pay attention to their output, but it is fair to say it's their responsibility for letting this slip through. But, it's worth noting in the context of the OP rant, that 23andMe probably paid much more attention to their tools than most academics who often treat alignment and variant calling as a black box that they trust works as advertised.
So what I actually argue in the post (and should have stated more clearly in my summary here) was that GATK is incentivised, as an academic research tool, to quickly advance their set of features with the cost of bugs being introduced (and hopefully squashed) along the way.
This "dev" state of a tool is inappropriate for a clinical pipeline, and GATK's teams' answer to that is a "stable" branch of GATK that will be supported by their commercial software partner. Good stuff.
Finally, I actually have no conflict of interest here as Golden Helix does not sell commercial secondary analysis tools (like CLC Bio does). I wrote this from the perspective of someone who is a 23andMe consumer as well as being informed as I give recommendations of upstream tools with our users (which I might add, I would still recommend and use GATK for research use, with the caution to potentially forgo the latest release for a more stable one).
You know though, the conflict of interest dismissal is something I run into more than I would expect. I'm not sure if some commercial software vendor has acted in bad faith in our industry to deserve the cynicism or if this is defaultly inherited by the "academic" vs "industry" ethos.