Hacker News new | ask | show | jobs
by nickjj 2023 days ago
How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?

I'm in the same boat as you in that I'd like to keep using pip but the lack of a lock file is very dangerous because it doesn't guarantee reproduceable builds (even if you use Docker).

In Ruby, Elixir and Node the official package managers have the idea of a lock file. That is the only reason I ever look into maybe switching away from pip.

Running a pip freeze to generate a requirements.txt file doesn't work nicely when you use a requirements.txt file to define your top level dependencies.

I've been bitten by issues like this so many times in the past with Python where I forgot to define and pin some inner dependency of a tool. Like werkzeug when using Flask. Or a recent issue with Celery 4.3.0 where they forgot to version lock a dependency of their own and suddenly builds that worked one day started to break the next day. These sets of problems go away with a lock file.

4 comments

> How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?

`pip-compile` from `pip-tools` is my go-to for this.

> Running a pip freeze to generate a requirements.txt file doesn't work nicely when you use a requirements.txt file to define your top level dependencies.

Use setup.cfg to define your top level dependencies. Use requirements.txt as your "lock" file. But even then you won't get reproducible builds across different OSes, or with different non-Python things installed on your machines. Use Docker images to guarantee staging and production will be identical.

There is one! Pip supports, since a few years back, a constraints file (i.e. a lock file) that does just this.

This is a nice guide on how to use it: https://www.lutro.me/posts/better-pip-dependency-management

This isn't quite the same. Yes, I can update my root dependencies in the `requirements.txt` file, run `pip install -r requirements.txt` and the run `pip freeze > requirements.txt`, but that's convoluted and requires me to know exactly what my root dependencies are. Is `astroid` something our tools use directly, or is it just a dependency of `pylint`? It's not clear. A lockfile clears this up.
Yes, and in addition to the requirements file, pip supports a constraints file, which is the lockfile you describe. It's separate from the requirements file. It solves exactly this problem.

Docs: https://pip.pypa.io/en/stable/user_guide/#constraints-files

The docs for this mentions:

> Including a package in a constraints file does not trigger installation of the package.

Maybe I'm not following something but how do you get all of this to work like a lock file in other package managers?

Let's use Ruby as a working example:

1. You start a new project and you have a Gemfile.

2. This Gemfile is where you define your top level dependencies very much like a requirements.txt file. You can choose to version lock these dependencies if you'd like (it's a best practice), but that's optional.

3. You run `bundle install`

4. All of your dependencies get resolved and installed

5. A new Gemfile.lock file was created automatically for you. This is machine generated and contains a list of all dependencies (top level and every dependency of every dependency) along with locking them to their exact patch versions at the point of running step 3.

6. The next time you run `bundle install` it will detect that a Gemfile.lock file exists and use that to figure out what to install

7. If you change your Gemfile and run `bundle install` again, a new Gemfile.lock will be generated

8. You commit both the Gemfile and Gemfile.lock to version control and git push it up

At this point you're safe. If another developer clones your repo or CI runs today or 3 months from now everyone will get the same exact versions of everything you had at the time of pushing it.

It should be the same process, except that the constraints file is not automatically created or detected, so step 5 would be "pip freeze >constraints.txt" and step 6 would be "pip install -r requirements.txt -c constraints.txt".

The top level dependencies go in requirements.txt and trigger installation of those packages. Everything else goes in the constraints file, which constrains the version that will be installed if something triggers an installation of the package, but it doesn't by itself trigger the installation - it only locks/constrains the versions.

Wouldn't you also need to run a pip3 install -r requirements.txt before the pip freeze?

Otherwise pip freeze won't find any dependencies.

So you end up having to run something like this:

    pip3 install -r requirements.txt
    pip3 freeze > requirements-lock.txt
    pip3 install -r requirements.txt -c requirements-lock.txt
Mainly because you can't run pip3 install -c requirements-lock.txt on its own it seems. It requires the -r flag.

That is a lot more inconvenient than running `bundle install` and if you use Docker it gets a lot more tricky because a new lock file would get generated on every build which kind of defeats the purpose of it, because ideally you'd want to use the existing lock file in version control, not generate a new one every time you build your image.

Nice! I’ll be adding this to my virtualenv + requirements.txt + pip process. Not sure why everyone wants to overcomplicate Python dependency management with pyenv/poetry/etc.
How do you validate versions against the constraint file? I read your original link a few times and didn’t see it.
pip install -r requirements.txt -c constraints.txt

You can also, thanks to the weird way requirements.txt works, put the line "-c constraints.txt" in requirements.txt. In that case you don't have to specify it when you run pip.

That should apply the constraints when installing packages. I don't know if there's also a way to validate what's already installed.

I'm not sure what you mean by, "How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?"

We use requirements.txt + Docker/k8s to lock in the OS. All of the versions of python modules are defined like:

   six==1.11.0
   sqlalchemy==1.2.7
   squarify==0.3.0
Which locks them to a particular version.

What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)

> What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)

The dependencies of the dependencies of what you have listed aren't guaranteed to be locked.

For example, let's say for arguments sake you were using celery.

When you install celery, by default these deps will also be installed: https://github.com/celery/celery/blob/master/requirements/de...

Those aren't very locked down. If you install celery today you might get vine 5.0.0 but in X months from now you might get 5.9.4 which could have backwards compatibility issues with what celery expects.

So now you build your app today and everything works but X months from now you build your same app with the same celery version and things break because celery isn't compatible with that version of vine.

This happened a few months ago. Celery 4.3.0 didn't version lock vine at all and suddenly all celery 4.3.0 versions broke when they worked in the past. That was tracked at https://github.com/celery/celery/issues/3547.

Docker doesn't help you here either because your workflow might be something like this:

- Dev works locally and everything builds nicely when you docker-compose build

- Dev pushes to CI

- CI builds new image based on your requirements.txt

- CI runs tests and probably passes

- PR gets merged into master

- CI kicks in again and builds + tests + pushes the built image to a Docker registry if all is good

- Your apps use this built image

But there's no guarantee what you built in dev ends up in prod. Newer versions of certain deps could have been built in CI. Especially if a PR has been lingering for days before it gets merged.

A lock file prevents this because if a lock file is present the lock file gets used, so if you built and included a lock file in version control, then CI will build what you pushed from dev, so the chain is complete from dev to prod for guaranteeing the versions you want. That is how it works with Ruby, Elixir, Node and other languages too. They have 2 files (a regular file where you put your top level deps and a machine generated lock file). A lock file in Python's world would translate to what pip3 freeze returns.

Thanks very much - your description of our workflow is really good (and is pretty close to exactly what we have!)

I don't understand what dependencies everyone keeps talking about (which seems to be a big deal with Poetry) - when you run: pip freeze

It captures every single python module, dependencies as well. Because everything in the dependencies file is listed as: aaaaaaa==xy.z

You are guaranteed to have the exact same version.

We have all sorts of turf wars when someone wants to roll forward the version of a module, and, in the case of the big ones (Pandas) we sometimes hold off for 6-9 months before rolling it forward.

But there is something that Poetry is doing that is better than "pip freeze" - I think once I figure that out, I'll have an "aha" moment and start evangelizing it. I just haven't got there yet.

You need a constraints file

https://pip.pypa.io/en/stable/user_guide/#constraints-files

pip install celery==4.3.0 -c constraints.txt

Where constraints.txt defines exact versions for everything

Why not keep the same docker image throughout then lifecycle? E.g. merge to dev branch, trigger ci (build image at this point), maybe deploy to a test environment, run more tests, then deploy to prod. No chance of packages changing since the image isn't rebuilt. Of course if not using docker, a lock file (i.e. actual dependency resolution) would seem essential for reproducibility.
First, how did you generate that requirements file?

Second, how do you separate dev dependencies from prod dependencies, and how do you update a dependency and ensure all of its transitive dependencies are resolved appropriately?

    pip freeze > requirements.txt
Lists every python module that's been loaded into the virtualenvironment. So, from my (admittedly new) understanding, that means we guarantee that in the production/devel/docker environment - every python module will be identical to whatever was installed in the virtual env.

Dependencies and transitive dependencies are guaranteed to be resolved/ensured because we list everyone one of them out in the requirements.txt file.

Yes, this works as you expect. What it lacks are three big things; making it easy to see what your direct dependencies are, separating dev dependencies from prod dependencies, and an easy way to update a dependency while resolving all transitive dependencies.

There are other shortcomings, but those are the big ones.