Hacker News new | ask | show | jobs
by pytyper2 2890 days ago
I'm not sure why the scientists don't use VMs and simply save the virtual disk files? That would at the very least allow them to verify the settings at a later date. Fresh install reproducibility doesn't seem necessary to verify experimental findings as long as the original vm is available to boot up.
2 comments

My guesses are that:

1. Integrating the development environment on their host PC (for example connecting RStudio in R's case, or connecting their web browser back to a server running in the VM in the case of Jupyter) is another set of skills to master.

2. Many data analyses are memory hungry unless you want to resort to coding practices that optimize for memory consumption. The overhead of running a VM is a bummer for some scientists.

3. Many scientists are not using Linux top-to-bottom, and therefore don't have a great way of virtualizing a platform that they are familiar with (e.g. Windows, macOS)

Can people think of others? I'm sure I'm missing some.

(EDIT: To be clear, I think VMs are a great path, but I do think there are some practical reasons why some scientists don't use them)

Often scientists are using hardware to acquire new data. The acquisition hardware might be on a PC that came installed from the manufacturer where you are told not to change anything.

Touching that PC, in anyway would be considered harmful to everybody using that specific piece of equipment.

Therefore, from the beginning of your acquisition, you are basically using a machine you don't control.

I think these and other issues can be solved with technical training.
Sure, they’re all mitigatable, but that technical training is competing with a lot of other considerations within the limited brainwidth of a scientist.

From the scientist’s perspective, a lot of this can start to feel like yak shaving. The opportunity costs are real.

Eh, maybe. Virtualbox is point and click at this point, and taken on in conjunction with their institutional IT departments as hopefully they do with all desktop point and click software, totally doable with 5-10 hours of training and some typed desk procedures. Learning new tools and workflow seems to be part of the job. As I type that I also thought of a different response from the perspective of a leader and software engineer, I did not type that response.
VMs are an easy copout to a problem that shouldn't be a problem in the first place.
That's not true. Python has C libraries, some might need to be built from source, and there's good reason to not allow root access on a lot of systems (and ability to install headers/dev packages, gcc, etc). System package management is hard and coordinating with (ubiqitous, not specific to Python) language package managers magnifies it. Unless you had some other solution in mind that I've missed...
We had the ability to run packages from a custom "root" prefix for ages in UNIX. If only package management tools all worked together, with the same central store, and respecting this...

There's never a need to mess with root access -- not even to install headers/dev packages. Dev tools and compilers can also be made to look to custom locations. I mean, there's never a need aside from self-imposed limitations our OSes and tooling places upon us.

Nothing inherently complex: just tons of accidental complexity.

Those things are only hard because we never did any coordinated effort to fix them.

conda basically proves that you can install almost everything one needs in the user's home directory. they have been working more and more on being completely independent from things like system compilers as well.
Conda is a great way to get gcc 7.2 on CentOS 6. Anaconda builds all of its packages targeting CentOS 6 for broad compatibility, but with the latest compilers to ensure we have the latest security features compiled in.
Goodness, you say that like it's a good thing. Yes, it is easy to download compiled binaries from a 3rd party.
I don't understand the nature of your discourse. You agreed that maintaining software distros is not easy, some recommended conda and you seem dismissive again?
This is not true, at least not without some sort of context around the work being performed and the requirements of the workflow.