Hacker News new | ask | show | jobs
by gabeiscoding 4889 days ago
On your first point, my post detailed that 23andMe confirmed it was a GATK bug that introduced the bogus variants and the bug was fixed in the next minor release of the software. There are comments on the post from members of 23andMe and the GATK team that go into more details as well.

On your second point. 23andMe had every incentive to pay attention to their output, but it is fair to say it's their responsibility for letting this slip through. But, it's worth noting in the context of the OP rant, that 23andMe probably paid much more attention to their tools than most academics who often treat alignment and variant calling as a black box that they trust works as advertised.

So what I actually argue in the post (and should have stated more clearly in my summary here) was that GATK is incentivised, as an academic research tool, to quickly advance their set of features with the cost of bugs being introduced (and hopefully squashed) along the way.

This "dev" state of a tool is inappropriate for a clinical pipeline, and GATK's teams' answer to that is a "stable" branch of GATK that will be supported by their commercial software partner. Good stuff.

Finally, I actually have no conflict of interest here as Golden Helix does not sell commercial secondary analysis tools (like CLC Bio does). I wrote this from the perspective of someone who is a 23andMe consumer as well as being informed as I give recommendations of upstream tools with our users (which I might add, I would still recommend and use GATK for research use, with the caution to potentially forgo the latest release for a more stable one).

You know though, the conflict of interest dismissal is something I run into more than I would expect. I'm not sure if some commercial software vendor has acted in bad faith in our industry to deserve the cynicism or if this is defaultly inherited by the "academic" vs "industry" ethos.

1 comments

> So what I actually argue in the post (and should have stated more clearly in my summary here) was that GATK is incentivised, as an academic research tool, to quickly advance their set of features with the cost of bugs being introduced (and hopefully squashed) along the way.

Sure, I agree with that. And I would agree if you would say "Using bleeding-edge nightly builds of %s for production-level clinical work is a bad idea," whether %s was the GATK or the Linux kernel. I would be in such complete agreement that I wouldn't even feel compelled to respond to your posts if that's what you would say originally, rather than saying, "the GATK ... should not be put into a clinical pipeline". The former is accepted practice industry-wide; the latter reads like FUD and cannot be justified by one anecdote.

> You know though, the conflict of interest dismissal is something I run into more than I would expect.

Regarding conflict of interest, my point in trying to understand your potential interests, and also disclosing my own so that you can see where I'm coming from. That's not a dismissal, it's a search for a more complete picture. Interested parties are often the most qualified commenters, anyway, but their conclusions merit review.

Hopefully people wouldn't dismiss my views because of my Broad connection, anymore than they would dismiss yours if you sold a competing product.

They key is 23andMe was not using bleeding-edge nightly builds but official "upgrade-recommended" releases.

GATK currently has no concept of a "stable" branch of their repo (Appistry is going to provide quarterly releases in the future, which is great).

The flag I am raising is that a "stable" release is needed before it get's integrated into a clinical pipeline. Because the Broad's reputation is so high, it is important to raise this flag as otherwise researchers and even clinical bioinformaticians assume choosing the latest release of GATK for their black-box variant caller is as safe as an IT manager choosing IBM.

Good call. Much like a Ubuntu LTE, having stable freezes of the GATK (now that it's relatively mature) that only get bug-fixes but no new (possibly bug-prone) features is a great idea.