Hacker News new | ask | show | jobs
by andai 598 days ago
There was a discussion the other day about how Python devs apparently don't care enough for backwards compatibility. I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

But then a few hours later, I tried running a very small project I wrote last year and it turned out that a bunch of my dependencies had changed their APIs. I've had similar (and much worse) experiences trying to get older code with dependencies running.

My meaning with this comment is, that if the average developer's reality is that backwards compatibility isn't really a thing anyway, then we are already paying for that downside so we might as well get some upside there, is my reasoning.

9 comments

It's hard to comment on this without knowing more about the dependencies and when/how they changed their APIs. I would say if it was a major version change, that isn't too shocking. For a minor version change, it should be.

Stuff that is actually included with Python tends to be more stable than random Pypi packages, though.

NPM packages also sometimes change. That's the world.

The big difference is that npm will automatically (since 2017) save a version range to the project metadata, and will automatically create this metadata file if it doesn't exist. Same for other package managers in the Node world.

I just installed Python 3.13 with pip 24.2, created a venv and installed a package - and nothing, no file was created and nothing was saved. Even if I touch requirements.txt and pyproject.toml, pip doesn't save anything about the package.

This creates a massive gap in usability of projects by people not very familiar with the languages. Node-based projects sometimes have issues because dependencies changed without respecting semver, but Python projects often can't be installed and you have no idea why without spending lots of time looking through versions.

Of course there are other package managers for Python that do this better, but pip is still the de-facto default and is often used in tutorials for new developers. Hopefully uv can improve things!

I recommend to start using UV.

It is very fast and tracks the libraries you are using.

After years of venv/pip, I'm not going back (unless a client requires it).

Another nice thing about uv is it can install python itself in the venv.

So no need to mess around with brew/deadsnakes and multiple global python versions on your dev system.

This is actually an improvement over the node/nvm approach.

> Of course there are other package managers for Python that do this better

I think if you are comparing with what NPM does then you would have to say that native pip can do that too. It is just one command

`pip freeze > requirements.txt`

It does include everything in the venv (or in you environment in general) but if you stick to only add required things (one venv for each project) then you will get requirements.txt files

Sure, you can manually do that. But my point is that pip doesn't do this automatically, and that is what makes so many Python projects essentially unusable without investing massive amounts of time into debugging. Good defaults and good standards matter a lot.
> without investing massive amounts of time into debugging

Again even if you are going to spend sometime to learn something that will have better tool for doing that like uv and poetry package managers. This is no massive amount of time. And eveb pip freeze is just one standard command and will give you portable environment be everything will be pinned in your environment. You just don't want to do everything with system global environment which is a common sense not a huge ask.

So I am not sure what is the massive amount of debugging needes for that.

I am specifically talking about the scenario where I, someone experienced with Python, am trying to use a project from someone who is not experienced in Python. There is nothing I can change since the project was already created some time ago. Most of the time, the only way for me to get the project running is to spend a lot of time debugging version issues - to find combinations of dependencies that are installable, and then to see if they work in the application context, and then to see if any bugs are part of the application or the dependency.

You might explain that away by asking why I'd want to run these projects, but a large percentage of data science projects I've tried to run end up in this scenario, and it simply doesn't happen with npm. A command existing is not a valid replacement for good defaults, since it literally affects my ability to run these projects.

I don't think this is the same. Does it also cover transitive dependencies?
Sorry if what I said about NPM is not accurate. But in reality if you are pinning the dependencies (all of them actual get pinned) then when pip is installing it will grab the correct version of the transitive dependency (both packages are pinned)

So I am not sure when this will become a problem.

All that can be specified in a pyproject.toml.

As some posters mentioned uv takes care of a lot of that and you can even pin it to a version of python.

If it’s just a one off script you can get all the dep information in the script header and uv can take care of all the venv/deps for you if you transfer the script to another machine by reading the headers in a comment section at the start of the script.

All this is based on PEPs to standardise packaging. It’s slow but moving in the right direction.

What do I have to put into pyproject.toml so that pip saves dependency ranges by default?
So pyproject.toml will be used by uv and others, like poetry. Pip uses a requirements.txt for depandancy management.

Using uv as an example[0]:

uv add "tqdm >=4.66.2,<5"

[0]https://docs.astral.sh/uv/concepts/dependencies/#project-dep...

I don't have access to uv to test that command at the moment, but that should work. uv then installs the dependency in the .venv directory in the project directory. This may include a specific version of python as well, if you pin one.

As I already replied to another user, I know that this exists, and it doesn't have anything to do with my point. So sadly your suggestion doesn't help in any way.
Yeah, I guess I should have done a pip freeze to specify the versions in the requirements file. I wasn't thinking ahead.

Turns out one dependency had 3 major releases in the span of a year! (Which basically confirms what I was saying, though I don't know how typical that is.)

3rd party package maintainers usually don't do as good a job of maintaining backwards compatibility or doing it right as do the core library maintainers, thats why you were able to upgrade from 2 to 3 by changing print to print() but then sometimes dependencies you install with pip break for inexplicable reasons.
So pin your deps? Language backwards compatibility and an API from some random package changing are completely distinct.
> So pin your deps?

Which is, fairly often, pinning your python version.

Pinning deps is discouraged by years of Python practice. And going back to a an old project and finding versions that work, a year or more later, might be nigh on impossible.

Last week I was trying to install snakemake via Conda, and couldn't find any way to satisfy dependencies at all, so it's not just pypi, and pip tends to be one of the more forgiving version dependency managers.

It's not just Python, trying to get npm to load the requirements has stopped me from compiling about half of the projects I've tried to build (which is not a ton of projects). And CRAN in the R universe can have similar problems as projects age.

> Pinning deps is discouraged by years of Python practice.

I'm not sure it is discouraged so much as just not what people did in Python-land for a long time. It's obviously the right thing to do, it's totally doable, it's just inertia and habit that might mean it isn't done.

> I'm not sure it is discouraged so much as just not what people did in Python-land for a long time. It's obviously the right thing to do, it's totally doable, it's just inertia and habit that might mean it isn't done.

Pinning obviously the wrong thing, it only works if everyone does it and if everyone does it then making changes becomes very hard. The right thing is to have deterministic dependency resolution so that dependencies don't change under you.

When they suggest you pin your dependencies, they don't just mean your direct dependencies, but rather all transitive dependencies. You can take this further by having a lock file that account for different Python versions, operating systems, and CPI architectures – for instance , by using UV or Poetry – but a simple `pip freeze` is often sufficient.
That works for your project, but then nobody can include you as a library without conflicts.

But having that lock file will allow somebody to reconstruct your particular moment in time in the future. Its just that those lock files do not exist for 99.9% of Python projects in time.

For some reason the "secure" thing to do is considered to be to pin everything and then continuously bump everything to latest, to get the security fixes.

At which point one might directly not pin, but that's "insecure" (https://scorecard.dev/)

That doesn’t match my experience at all. I have many Python projects going back years that all work fine with pinned dependencies
It took me few days to get some old Jupyter Notebooks working. I had to find the correct older version of Jupyter, correct version of the every plugin/extension that notebook used and then I had to find the correct version of every dependency of these extensions. Only way to get it working was a bunch of pinned dependencies.
Had they been properly pinned before, you would not have had to work for a few days. Code in a Jupyter notebook is unlikely to be relied upon elsewhere. Perfectly good for making it always use the exact same versions (checked by checksums, whatever tool you are using).
Pinning by ‘pip freeze’ only works for a specific platform since there are often differences in the wheels available, particularly for older things.

Conda is it’s own kettle of fish especially given different channels and conda-forge which you have to remember.

I’m curious as to which packages you are unable to find older versions for. You mention snakemake, but that doesn’t seem to have any sort of issues.

https://pypi.org/project/snakemake/#history

It's not about finding old packages, it's about not finding the magical compatible set of package versions.

Pip is nice in that you can install packages individually to get around some version conflicts. But with conda and npm and CRAN I have always found my stuck without being able to install dependencies after 15 minutes of mucking.

Its rare that somebody has left the equivalent of the output of a `pip freeze` around to document their state.

With snakemake, I abandoned conda and went with pip in a venv, without filing an issue. Perhaps it was user error from being unfamiliar with conda, but I did not have more time to spend on the issue, much less doing the research to be able to file a competent issue and follow up later on.

It’s a little hard for me to talk about Python setups which don’t use Poetry as that is basically the standard around here. I would argue that not controlling your packages regardless of the package manager you use is very poor practice.

How can you reasonably expect to work with any tech that breaks itself by not controlling its dependencies? You’re absolutely correct that this is probably more likely to be an issue with Python, but that’s the thing with freedom. It requires more of you.

Yes and no.

There are different types of dependencies, and there are different rules for them, but here's an overview of the best practices:

1. For applications, scripts and services (i.e. "executable code"), during development, pin your direct dependencies; ideally to the current major or minor version, depending how much you trust their their authors to follow SemVer. Also make sure you regularly update and retest the versions to make sure you don't miss any critical updates.

You should not explicitly pin your transitive dependencies, i.e. the dependencies of your direct dependencies -- at least unless you know specifically that certain versions will break your app (and even then it is better to provide a range than a single version).

2. For production builds of the above, lock each of your dependencies (including the transitive ones) to specific version and source. It is not really viable to do it by hand, but most packaging tools -- pip, Poetry, PDM, uv... -- will happily do that automatically for you. Unfortunately, Python still doesn't have a standard lock format, so most tools provide their own lock file; the closest thing to a standard we have at the moment is pip's requirements file [0].

Besides pinned versions, a lock file will also include the source where the packages are to be retrieved from (pip's requirements file may omit it, but it's then implicitly assumed to be PyPI); it can (and should, really) also provide hashes for the given packages, strengthening the confidence that you're downloading the correct packages.

3. Finally, when developing libraries (i.e. "redistributable code"), you should never pin your dependencies at all -- or, at most, you can specify the minimum versions that you know that work and have tested against. That is because you have no control over the environment the code will eventually be used and executed in, and arbitrary limitations like that might (and often will) prevent your users to update some other crucial dependency.

Of course, the above does not apply if you know that a certain version range will break your code. It that case you should most definitely exclude it from your specification -- but you should also update your code as soon as possible. Libraries should also clearly specify which versions of Python they support, and should be regularly tested against each of those versions; it is also recommended that the minimal supported version is regularly reviewed and increased as new versions of Python get released [1].

For more clarity on abstract vs concrete dependencies, I recommend the great article by Donald Stufft from 2013 [2]; and for understanding why top-binding (i.e. limiting the top version pin) should be avoided there is a lengthy but very detailed analysis by Henry Schreiner [3].

[0] https://pip.pypa.io/en/stable/reference/requirements-file-fo...

[1] https://devguide.python.org/versions/

[2] https://caremad.io/posts/2013/07/setup-vs-requirement/

[3] https://iscinumpy.dev/post/bound-version-constraints/

I reckon you're aware of it, but they're actively discussing a lock file format PEP and I'm quite hopeful this time it will actually get accepted

https://discuss.python.org/t/pep-751-now-with-graphs/69721

In a poetry lock file transitive dependencies are automatically locked and thereby pinned. It will ensure, that you get the same thing each time, or get an error about things not matching hashsums, when something suspicious is going on, that would be worth raising an issue on a repo, if none exists.
> In a poetry lock file transitive dependencies are automatically locked and thereby pinned

That is true for all formats of lock files, by definition.

The Python 2 to 3 thing was worse when they started: people who made the mistake of falling for the rhetoric to port to python3 early on had a much more difficult time as basic things like u"" were broken under an argument that they weren't needed anymore; over time the porting process got better as they acquiesced and unified the two languages a bit.

I thereby kind of feel like this might have happened in the other direction: a ton of developers seem to have become demoralized by python3 and threw up their hands in defeat of "backwards compatibility isn't going to happen anyway", and now we live in a world with frozen dependencies running in virtual environments tied to specific copies of Python.

Honestly I think a big issue is that it’s not just legacy code, it’s also legacy code which depends on old dependency versions. Eg there’s an internal app where I work stuck on 3.8.X because it uses deprecated pandas syntax and is too complicated to rewrite for a newer version easily.
> Python 2 code running on Python 3 by just changing print to print().

This was very much the opposite of my experience. Consider yourself lucky.

This migration took the industry years because it was not that simple.
> This migration took the industry years because it was not that simple.

It was not that simple, but it was not that hard either.

It took the industry years because Python 2.7 was still good enough, and the tangible benefits of migrating to Python 3 didn't justify the effort for most projects.

Also some dependencies such as MySQL-python never updated to Python 3, which was also an issue for projects with many dependencies.

Maybe his application was an hello world!
What APIs were broken? They couldn't be in the standard library.

If the dependency was in external modules and you didn't have pinned versions, then it is to be expected (in almost any active language) that some APIs will break.

They couldn't be in the standard library.

Why not? Python does make breaking changes to the standard library when going from 3.X to 3.X+1 quite regularly.

Only usually after YEARS of deprecation warnings
Async was a good example where that didn’t happen, but to be fair to the maintainers, it was fairly experimental
Python drops modules from the standard library all the time these days. It's a pain in the ass.

Now even asyncore is gone -_-' Have fun rewriting all the older async applications!

Sadly, several python projects do not use semantic versioning, for example xarray [0] and dask. Numpy can make backward incompatible changes after a warning for two releases[1]. In general, the python packaging docs do not really read as an endorsement of semantic versioning [2]:

> A majority of Python projects use a scheme that resembles semantic versioning. However, most projects, especially larger ones, do not strictly adhere to semantic versioning, since many changes are technically breaking changes but affect only a small fraction of users...

[0] https://github.com/pydata/xarray/issues/6176

[1] https://numpy.org/doc/stable/dev/depending_on_numpy.html

[2] https://packaging.python.org/en/latest/discussions/versionin...

Even after it finished burning a billion lines of python 2 code (largely unnecessarily imho) python 3 seems to retain an unhealthy contempt for backward compatibility. I have had similar experiences where python 3 projects require a particular version of python 3 in order to run.

I like python (and swift for that matter) but I don't like the feeling that I am building on quicksand. Java, C++, and vanilla javascript seem more durable.

> I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

...

> I wrote last year and it turned out that a bunch of my dependencies had changed their APIs

these two things have absolutely nothing to do with each other - couldn't be a more apples to oranges comparison if you tried

I ran into both of these things in the same context, which is "the difficulty involved in getting old code working on the latest Python environment", which I understood as the context of this discussion.
I'd drop libraries that do like that.