Hacker News new | ask | show | jobs
by gabeiscoding 3813 days ago
While I think it's great to have Google putting their weight behind standardization efforts like Global Alliance for Genomic Health (GA4GH), I really don't get the need to replace VCF and BAM files with API calls.

Ultimately, the "hard part" about genomics is not big-data requiring Spanner and BigTable to get anything done. I actually wrote a blog post about this this week:

http://blog.goldenhelix.com/grudy/genomic-data-is-big-data-b...

Both BAM and VCF files can be hosted through a plain HTTP file-server and be meaningfully queried through their BAI/TBI indexes. Visualization tools like our GenomeBrowse or the Broad's IGV can already read S3 hosted genomic files directly without having an API layer and very efficiently (gzip compressed blocks of binary data). So, I see the translation of the exact same data into API-only accessible storage system, where I can't download the VCF and do quick and iterative analysis on it more of a downside that plus.

Disclaimer: I build variant interpretation software for NGS data at Golden Helix. Our customers are often small clinical labs who size of data and volume are not driving them to the cloud.

1 comments

How do you think this compares against http://basepair.io ?
Looks great, but I can't comment more as I haven't used it.

It looks to be solving the same problems as DNAnexus, Seven Bridges, BaseSpace etc as a way to wrap open source tools in more user-friendly ways.

But it's orchestrating the production of smaller set of data that still needs the next step of human interpretation, report writing, family-aware algorithms and most complex annotations (the problem space Golden Helix is in).

In other words, the automatable bits that is not the hard part that I mentioned in my blog post.