Hacker News new | ask | show | jobs
by monatron 3335 days ago
Reproducibility is a big problem in analytic pipelines, specifically genomics. Researchers tend tweak their tools here and there and it has always been quite difficult to "package" a pipeline. Docker makes that much easier. We've seen a huge movement in the containerization of genomics pipelines, the results of which (we hope) help cure serious disease. In my world, we're using Docker every day to try and find cures for pediatric brain cancer. It's obviously not our only tool, but it helps.
2 comments

Not in genomics but the same in true in spatial pipelines. Trying to get a machine setup with the correct set of libraries and packages across various languages is almost impossible on dev box much less on a prod cluster. Docker makes this easy to do once locally, validate the build and then use the same container to do the production runs.
Neural networks that construct ontologies from gene similarity matrices for discovering cancer pathways are tools that help cure cancer. There are a myriad of technologies supporting these type of applications; Torch, Python, Tensorflow... Docker is a minor detail that has as much to do with curing cancer as JSON or HTTP. When you make a claim like the one Docker is making, you should be a critical piece of the pipeline that actively contributes towards the end goal, not a piece of infrastructure. As a scientist, I find it disgusting that they're using cancer research as their buzz word for stability and importance.
> Reproducibility is a big problem in analytic pipelines, specifically genomics... [Docker solves this problem].

There is a legitimate fundamental issue with reproducing people's setups and Docker solves this. Ergo docker is valuable technology for data science.

Literally typing `docker run...` to reproduce results (vs. VM setup, installation etc.) seems important enough to justify their phrasing.

Fact: there are handfuls of data science startups that essentially rebrand docker and have been funded in the hundreds of millions of dollars specifically to solve this problem.