Hacker News new | ask | show | jobs
by east2west 1734 days ago
I recall that the group that created Spark had a bioinformatics project on Spark but I don't know what happened to it. All I could find now is a paper[1] hosted by databricks.

[1]https://databricks.com/wp-content/uploads/2018/08/SSE15-40-D...

2 comments

We're here, still plugging along.

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

https://github.com/bigdatagenomics/adam

Yep, that's the one I was thinking of (along with GNOMAD, which IIRC uses ADAM or some similar tech). My main complaint with ADAM was that they came up with their own file format (which had some flaws). But the general idea is the right one.
I'm interested in chatting with you about this, and genomics on Spark more generally, feel free to reach out on Github or via my username at the usual suspects.
I left this field, actually. I cofounded Google Cloud Genomics, and when I proposed that we pivot from working with the GA4GH (very stupid APIs) to working with ADAM (real data processing) I got kicked off the team. Since then I've come to see genomics as a minefield of bad practices and don't really work in the field any more, except to help scientists run their workflows in the cloud.