Hacker News new | ask | show | jobs
by lsorber 2195 days ago
My experience has been the opposite: building Docker images is much easier with conda than it is with pip. With conda you can start from miniconda3, copy an environment.yml, and then conda create it. With pip, you might need to take additional steps to install system dependencies like build-essential first, and you'll need different tools to manage your virtual environment and Python installation too.
1 comments

Yeah I know I'm on one side of the fence with this. The people I work with (not engineers) use conda all the time and love it.

A big thing for me is that apt + pip means I know more about what I'm installing, whereas conda seems like it'll go off and do what it thinks is best. If it's going into production then I want to know why we need package X and how it's been installed, rather than "conda says we need it".

Basically, I think conda encourages "install and forget" which means people don't really know or understand what they're putting into production. And that can cause a lot of problems further down the line.

Also the fact that conda installs everything to sandbox causes it's own issues. Suddenly I have two versions of a system package. Now I've got to do extra work to deal with that.

Then again, it could just be I never got over the time where uninstalling anaconda ripped apart the python install on my MacBook. That's a weekend I won't get back.

__

Also, Production shouldn't use virtual environments in my opinion. That's an additional deployment/build step which could fail one day. The container image is a virtual environment in and of itself anyway!

Sounds like you just need to train your colleagues to be a bit more disciplined with their package managing. It's not that hard to be clean about dependencies with conda. Maybe my take on it can inspire you here: https://haveagreatdata.com/posts/data-science-python-depende...

Re the extra "virtual environment", you can just use conda's base environment in prod. Here's how to do that: https://haveagreatdata.com/posts/step-by-step-docker-image-f...

Read your post. Yeah, conda works fine with pip stuff. And yes, what you've detailed is similar to pinning versions in requirements. Which I do as standard -- I even do it with apt installs sometimes.

But I often need to install via system package management for other dependencies. conda doesn't respect the base system package manager. That is what causes headaches.

If conda respected system package management first, then installed as necessary, I wouldn't have a problem with it as an admin. But it doesn't because it's not built for engineering/admins (want stability + efficiency), it's built for scientific projects (want to run code easily).

Also, I'm using the "royal* we. Like, we as in admins generally. I'm the only admin in my team (voluntarily), so I need to be ruthless with this type of stuff.

EDIT:

I think you missed my point about virtual environments.

The entire container is a virtual environment. Why would we want to use another virtual environement for no reason except the fact that conda wants us to?

It adds extra steps which we'd have maintain. Which means more developer resource spent on maintenance. Which means less time spent on new features.

It's just another thing that could go wrong. Simpler systems break less often.

I see where you're coming from.

My use case is this: as a data scientist, I start new code bases all the time. Each project, simple experiment, data analysis, etc. needs its own cleanly separated dependency environment so I don't end up in dependency hell (I have 12 conda environments on my machine right now). Conda allows me to handle these environments with ease (one tool and a handful of commands -> as detailed in the article). With conda, I also have my data science Python cleanly separated from my system Python.

Of course there are other tools that can handle this use case. But pip alone won't do the trick. I don't like to have three separate tools for this (pip + venv + pyenv).

When I put something into production, I naturally want to keep using my conda environment.yml and have the same environment in dev and prod instead of switching to pip + requirements.txt, which might introduce inconsistencies.