Hacker News new | ask | show | jobs
by collyw 4004 days ago
To tell all the cool kids you are using it.

(Ok, I know there are real use cases for Docker, but I see a lot of hype as well. People telling my mathematician friend that she needs to use docker at the start of her project - it is likely to be a one off graph she needs to produce for a research paper).

1 comments

There is a big push for reproducibility in science. If you friend can package the process for building that graph in a Dockerfile, it is more likely that readers of her paper will be able to reproduce her results.
or, you know, publish the formula, so readers can reproduce in whatever language / system they want.

Reproducibility is a big push.... but not like you are suggesting. Shipping a dockerfile is the equivalent of saying "This works, if you use this flask, this pipette, this GCMS and this piece of litmus paper"

Docker is not the only solution to problems. It solves some, but you can't tack it on to everything.

Why not both? I am not in academia but I was under the impression that some academics might be publishing 'questionable' results that cannot be reproduced at all in order get their paper count up for tenure review. Not to mention puff-pieces from industry that basically serve as PR in peer reviewed journals without furthering their discipline.

So shipping working code (even if it comes with a required pipette) might be a nice requirement for a peer reviewed publication to take on in order to keep their journal relevant. Shipping in Docker or similar guarantees reproducibility.

If the code is crap, and only works on one particular data set, then putting it in a docker container ain't going to help.
This is an area where a lot of companies are focusing in terms of data science.

As you noted, reproducibility is a huge issue in the scientific community (according to docker users/vendors I've spoke to) to the extent that there are a number of funded startups trying to solve this problem (some using Docker.)

What has been also very surprising is the big companies who have read only copies of analytic data they want to run computations on - sandboxing the data scientists scripts in a container has helped them tremendously in terms of supporting the execution.