Hacker News new | ask | show | jobs
by whitehouse3 2805 days ago
While pipenv has garnered a lot of attention and praise for ease of use, it falls over whenever I integrate it with any serious work. Pipenv lock can take 20-30 minutes on a small flask app (~18 dependencies). And it often mixes up virtualenvs, enabling the wrong one with seemingly no remedy. I see the problems on Windows, MacOS and Ubuntu. 2018 is not the year of pipenv, for me. I'm sticking with regular virtualenvs and the manual-hell of requirements.txt. I hope it gets better eventually.
10 comments

I tried a few of the package management in Python recently (https://www.vincentprouillet.com/blog/overview-package-manag...) and had the same conclusion with Pipenv. It is way too slow and frankly the UX is not that great either.
I’m disappointed your post did not cover using conda. As the pipenv drama has rolled on, I’ve moved from viewing conda merely as the best user experience in Python environment & package management to instead viewing it as the only serious option for professional scientific computuing work (and quite possibly any professional Python work at all).
Agree with you about using conda. And since no one has mentioned Jake Van der Plas' review of conda vs the alternatives, myths, etc., here it is:

https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-mi...

I read that page and looked for the reason I don't use Conda (because I already have virtualenvs and I'm not prepared to burn them all down):

> Myth #5: conda doesn't work with virtualenv, so it's useless for my workflow

> Reality: You actually can install (some) conda packages within a virtualenv, but better is to use Conda's own environment manager: it is fully-compatible with pip and has several advantages over virtualenv.

> [...] the result seems to be fairly brittle – for example, trying to conda update python within the virtualenv fails in a very ungraceful and unrecoverable manner, seemingly related to the symlinks that underly virtualenv's architecture.

Doesn't sound like much of a myth then, if Conda's take on virtualenv is "you can technically do this, but everything will break ungracefully and unrecoverably, so please don't".

He's not saying that you _should_ install conda within a virtualenv, but that some have tried with some success.

At the end, one of his conclusions is: "If you want to install Python packages within an Isolated environment, pip+virtualenv and conda+conda-env are mostly interchangeable". So don't change if you don't have to.

But he does give reasons why conda may be superior to virtualenv -- managing different version of Python, tracking non-Python dependencies, true isolation of environments, etc.

I should probably write another post once I've tried conda a bit more. I've used it very recently for some numpy/pytorch environment and it was quite nice.
Although conda is also getting slower and slower, and now routinely spends 5-20 minutes on dependency resolution even for trivial environments.
This is deeply untrue for conda. Even very complex environments build in less than a minute. I can believe there are corner cases where conda is very slow, but claiming conda takes 5 minutes for trivial environments is flat out wrong. Perhaps it is issues with a firewall, VPN connection or something else, absolutely no chance that is from normally executing conda.
Well, I experience it on a daily basis, and I'm not the only one! It's officially an open issue: https://github.com/conda/conda/issues/7239
The slowest operation in the linked thread is still taking less than 2 minutes...

Edit: correction, there are two examples that take longer, one at 3.5 minutes, one around 8 minutes. I don’t think it changes any takeaways though.

Am I weird that I use anaconda instead of virtualenvs? I guess it’s overkill if you aren’t using the other conda features.
It's worth mentioning that the package manager component of anaconda is released as a separate (small) install: miniconda[1]. It includes only Python and the package manager, and not all of the 700+ packages installed as part of a full anaconda installation.

With its ability to install Python and non-python packages (including binaries), conda is my go-to for managing project environments and dependencies. Between the bioconda[2] and conda-forge[3] channels, it meets the needs of many on the computational side of the biological sciences. Being able to describe a full execution environment with a yaml file is huge win for replicable science.

1. https://conda.io/miniconda.html https://conda.io/docs/user-guide/install/index.html

2. http://bioconda.github.io/

3. https://conda-forge.org/

It is an overkill for pure-python packages or packages with simple C extensions. Conda was developed specifically to handle non-python dependencies, which would be difficult to build in setup.py.

Also, a conda package is not a replacement for a distutils/setuptools package. When building a conda package, one still calls setup.py. So every python conda package has to be a distutils/setuptools package anyway.

Thanks for caveat. Nonetheless, anaconda makes my life so much easier when working with python libraries. If anybody got any other reasons to be careful of it, I'm interested!
If you need to work with cutting-edge python tools (e.g.from GitHub) it’s often easier to use virtualenv to control the versions you need installed.
conda environments support pip and arbitrary pip commands. So if you use pip to for example install a specific version of a library directly from github that information will be stored in your conda environment and be reproduced every time you recreate your environments.
It seems that Anaconda is under-appreciated outside of pydata circles. Before using it I had no idea that it could manage virtual environments, dependencies and different versions of python.

The fact that it's not a community-driven project might be one of the reasons.

Meanwhile, in a galaxy far away, people are also using buildout.

I have been revisiting buildout recently and I wish there's something that merge ease of use of pipenv with buildout concept. Perhaps something similar to Nix throw in the mix, but more specific to a Python project. I heard that I can do this with Conda, but I never tried.

Being able to define and install external dependencies (e.g. ImageMagick, libsodium, etc.) from a configuration file local to a project is something I missed the most, especially when I'm working on several projects at once.

> Perhaps something similar to Nix throw in the mix, but more specific to a Python project.

Any examples of how Nix itself doesn't do what you need? One example I can think of: Nix doesn't support Windows.

https://nixos.org/nixpkgs/manual/#python

Nix does everything I want, but I find it hard to convince friends and coworkers to try out Nix. I think this is partly due to Nix itself not belonging to the Python's ecosystem, so the barrier is higher than say, "Yeah, Pipenv is just Virtualenv+Pip"
I think Conda is the way to go, because:

- there is Miniconda that doesn't force you to install all PyData packages. - virtual envs and needed packages are all defined in simple yaml file. - it works well with pip. So if a package isn't in Conda repository, you can install from pip. The annoyance here is that you must try conda, fail, and then try pip. - you can easily clone envs. So you can have some base envs with your usual packages (or one for Python 2 and another fo Python 3), and just clone them to start a new project.

I tend to only use anaconda for "data science work" and not for my small side projects.

I "feel" like it is overkill to use anaconda for things unrelated to 'data science' and the likes, but I'm not sure why I feel that way.

It kind of makes sense to use for other projects as well since you don't need to import all the things conda offers.

For one, as a package developer, publishing a source distribution of a package on PyPI is almost trivial. Publishing on Anaconda Cloud requires you to build the binary packages on all the OS's that you want to support (and for all Python versions you want to support) which most people delegate to some CI. So there is a whole new level of complexity involved.
You can use pip-install from within a conda environment. Conda isolates better than virtualenv does, IMHO.
When I have used it, and I have to for a certain project, it is incredibly slow to resolve depenendcies. Enough so that I go for a walk or do something else for 15 minutes while it thinks about whatever it's doing.
Yeah, this is becoming a real problem. This didn't use to happen, but now conda is slow to the point of being unusable if you need to create environments a lot (like during testing)
> Pipenv lock can take 20-30 minutes on a small flask app (~18 dependencies)

Do you have scipy/numpy/keras or cython somewhere in the deps? pipenv lock is slow, but not 20-30 mins slow unless there's a very very large download and/or a long compilation somewhere in there.

The web app in question depends on PANDAS + numpy so that's definitely part of the toolchain. It wasn't 20-30 minutes from day one. The lock time started fast and then ballooned. Other comments here saying 2-3 minutes per lock are consistent with my general experience.

This wasn't for complex pipenv operations either. A simple command: pipenv run python main.py took progressively longer to execute.

It takes about 2 minutes (feels like 5!) on my 2016 MBP to install 102 dependencies. Doing that in Docker takes about 1.5x the time. I haven't seen it take 20-30 minutes, but 2-3 minutes is still obscenely slow in my view.
A lot of this time may be spent on downloading the dependencies to the cache. If you're doing it in docker, you likely don't have a persistent cache. I've hit this issue before. https://github.com/pypa/pipenv/issues/1785

If you configure the cache properly you might solve it, but yeah it's kinda dumb it has to do that just for locking.

There is no reason for Docker to be slower, it must be some kind of configuration issue. Containers are basically just processes and there is virtually no difference in execution times.
As I said in my other reply, Docker is more likely to have ephemeral storage for the cache. So every single lock it'll re-download the package. Whereas locally, you're likely to still have the packages cached.

This can make a difference of tens of minutes for some packages which have a 1 gigabyte (!!!) download.

I have a docker project with ~20 packages in the Pipfile, the lock step of a new `pipenv install` takes about 3 minutes.
I used to use pipenv and I find that hard work of actually properly learning the python pip/requirments.txt/setup.py/venv landscape well enough that I don't have problems anymore, took less work than actually getting pipenv to work right.
> Pipenv lock can take 20-30 minutes on a small flask app (~18 dependencies)

I've never seen anything like that on a number of fairly large apps – a minute or two, at most. Are some of those dependencies extremely large or self-hosted somewhere other than PyPI?

I got so frustrated using pipenv at work that I created an alternative package manager: https://pypi.org/project/dotlock/. It's not 1.0 yet but if it suits your needs I'd love if you tried it out.
Can you elaborate on why using a requirements.txt file is "manual hell"?

I rely on it for pretty much everything and I didn't run into game breaking problems.

The only problem I know with requirements.txt is that many people would require particular versions there while later versions work perfectly fine. Every time I clone someone's Python project to work with I have to manually replace all the =s with >=s to avoid downloading obsolete versions of the dependencies and have never encountered a problem.

Anyway, for me the most annoying thing about the Python projects architecture (and about the whole Python perhaps) is that you can't split a module into multiple files so you have to either import everything manually all over the project trying to avoid circular imports or just put everything in a single huge source file - I usually choose the latter and I hate it. The way namespaces and scopes work in C# feels just so much better.

> have never encountered a problem.

Oh, so you weren't around when Requests went 2.0 backward-incompatible (because they changed .json() with .json, or the other way around, can't remember) and half of PyPI, with its happy-go-lucky ">=1.0", broke...?

Since then, most people have learnt that you pin first and ask questions later.

Indeed. I just hate the versions hell (as well as dealing with old versions of a language although I happen to love old hardware) so much that I've been ignoring the whole Python until the 3.6 release waiting for the time when one will be able to use all the Python stuff without bothering to learn anything about Python 2.x. It took 10 years of waiting but we are finally here now and now I enjoy Python :-)
I just encountered the fun fact that Pip 18.1 broke Pipenv whereas Pip 18.0 worked just fine.
It requires a lot of work to produce reproducible/secure builds, see the original Pipfile design discussion for gory details:

https://github.com/pypa/pipfile

The problem requirements.txt doesn't solve is "what I want" versus "what I end up with".

There's no concept of explicit versus implicit dependencies. You install one package, and end up with five dependencies locked at exact versions when you do `pip freeze`. Which of those was the one you installed, and which ones are just dependencies-of-dependencies?

If you're consistent and ALWAYS update your requirements.txt first with explicit versions and NEVER use `pip freeze` you might be okay, but it's more painful than most of the alternatives that let you separate those concepts.

because if you pin stuff in requirements.txt, they either never get updated, or you have to go through, check which ones have updated, and manually edit the requirements.txt. the combination of Pipfile and Pipfile.lock were designed to solve this in a much better way (briefly: understanding standard deps vs development deps, and using the Pipfile.lock file for exact pinning/deployment pinning, vs general compatibility pinning in the Pipfile).
That is not my experience. Until recently, I used to pin versions in requirements.txt, then from time to time I removed the pinned versions, reinstalled everything, tested and added new versions to requirements.txt. Most of the work was testing for incompatibilities, but no package manager will help you there.

Recently I switched to pipenv because zappa insists on having virtualenv (as app dev I never had any need for it - but it seems my case is an exception, as I almost never work on multiple apps in parallel). Pipenv does make version management a bit easier, but it wasn't difficult (for me) to begin with.

From talking with other developers I know my view is somewhat unorthodox, but I haven't encountered the problems they describe, or the pain hasn't been that big for me to embrace all the issues that come with virtualenvs.

Btw, it is possible to use the compatible ~= operator (PEP 440) within requirements.txt.
Or just use pip-tools to automatically update dependency versions.
I thought I was the only one to not get pipenv to work. I tried it out after it was released and it was buggy. Every time I used it afterwards I would get a weird edge case that would make me go back to pip and virtualenv.
Something sounds broken here, I have projects with similar numbers of dependancies, including heavy ones like pandas and numpy, and don't get anywhere near that long to lock. I don't have a specific suggestion for you, though. How long does a regular `pip install -r requirements.txt` take for the same dependancies?
Pipenv is hopelessly slow. It's a shame. Remember when git first came out and it changed the way we worked because it was so quick to commit now? (I fully expect that most git users here don't remember that, actually). There is no going back. I will not use slow tools. My tools need to be at the very least as fast as me.
> Pipenv is hopelessly slow.

Interesting, this has never been a problem for me. I've built some large tools and while it isn't fast, it's always completed in a few minutes.

A few minutes??! That sounds very slow.
To be clear: with few deps it's very fast for me, it's just lager projects with LOTS of non-trivial deps where it can slow up.
What OS?
Mid-2015 MacBook Pro running the newest OS
It is abundantly clear that the pipenv developers use MacOS so I wonder if it's an OS dependent thing.
my current project is at 16 dependencies atm and ... its really not as bad as you make it sound.

    pipenv lock  5.65s user 0.29s system 77% cpu 7.639 total
i think 7.6 seconds is fine for an operation that you'd rarely do

it would probably take ages at work though. just opening a WSL terminal takes several seconds there, which is predictably instantaneous (<100ms) on fedora linux at home

SSD vs HDD may be?