Hacker News new | ask | show | jobs
by ghshephard 2023 days ago
I'm not sure what you mean by, "How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?"

We use requirements.txt + Docker/k8s to lock in the OS. All of the versions of python modules are defined like:

   six==1.11.0
   sqlalchemy==1.2.7
   squarify==0.3.0
Which locks them to a particular version.

What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)

2 comments

> What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)

The dependencies of the dependencies of what you have listed aren't guaranteed to be locked.

For example, let's say for arguments sake you were using celery.

When you install celery, by default these deps will also be installed: https://github.com/celery/celery/blob/master/requirements/de...

Those aren't very locked down. If you install celery today you might get vine 5.0.0 but in X months from now you might get 5.9.4 which could have backwards compatibility issues with what celery expects.

So now you build your app today and everything works but X months from now you build your same app with the same celery version and things break because celery isn't compatible with that version of vine.

This happened a few months ago. Celery 4.3.0 didn't version lock vine at all and suddenly all celery 4.3.0 versions broke when they worked in the past. That was tracked at https://github.com/celery/celery/issues/3547.

Docker doesn't help you here either because your workflow might be something like this:

- Dev works locally and everything builds nicely when you docker-compose build

- Dev pushes to CI

- CI builds new image based on your requirements.txt

- CI runs tests and probably passes

- PR gets merged into master

- CI kicks in again and builds + tests + pushes the built image to a Docker registry if all is good

- Your apps use this built image

But there's no guarantee what you built in dev ends up in prod. Newer versions of certain deps could have been built in CI. Especially if a PR has been lingering for days before it gets merged.

A lock file prevents this because if a lock file is present the lock file gets used, so if you built and included a lock file in version control, then CI will build what you pushed from dev, so the chain is complete from dev to prod for guaranteeing the versions you want. That is how it works with Ruby, Elixir, Node and other languages too. They have 2 files (a regular file where you put your top level deps and a machine generated lock file). A lock file in Python's world would translate to what pip3 freeze returns.

Thanks very much - your description of our workflow is really good (and is pretty close to exactly what we have!)

I don't understand what dependencies everyone keeps talking about (which seems to be a big deal with Poetry) - when you run: pip freeze

It captures every single python module, dependencies as well. Because everything in the dependencies file is listed as: aaaaaaa==xy.z

You are guaranteed to have the exact same version.

We have all sorts of turf wars when someone wants to roll forward the version of a module, and, in the case of the big ones (Pandas) we sometimes hold off for 6-9 months before rolling it forward.

But there is something that Poetry is doing that is better than "pip freeze" - I think once I figure that out, I'll have an "aha" moment and start evangelizing it. I just haven't got there yet.

You need a constraints file

https://pip.pypa.io/en/stable/user_guide/#constraints-files

pip install celery==4.3.0 -c constraints.txt

Where constraints.txt defines exact versions for everything

Why not keep the same docker image throughout then lifecycle? E.g. merge to dev branch, trigger ci (build image at this point), maybe deploy to a test environment, run more tests, then deploy to prod. No chance of packages changing since the image isn't rebuilt. Of course if not using docker, a lock file (i.e. actual dependency resolution) would seem essential for reproducibility.
First, how did you generate that requirements file?

Second, how do you separate dev dependencies from prod dependencies, and how do you update a dependency and ensure all of its transitive dependencies are resolved appropriately?

    pip freeze > requirements.txt
Lists every python module that's been loaded into the virtualenvironment. So, from my (admittedly new) understanding, that means we guarantee that in the production/devel/docker environment - every python module will be identical to whatever was installed in the virtual env.

Dependencies and transitive dependencies are guaranteed to be resolved/ensured because we list everyone one of them out in the requirements.txt file.

Yes, this works as you expect. What it lacks are three big things; making it easy to see what your direct dependencies are, separating dev dependencies from prod dependencies, and an easy way to update a dependency while resolving all transitive dependencies.

There are other shortcomings, but those are the big ones.