Hacker News new | ask | show | jobs
by arturkane7 1262 days ago
Good point. I am wondering what are the specific reasons say Java or JS don’t suffer from this, and hoping for rant-free responses that don’t turn into language wars.
5 comments

In part because Python is often used as an OS language (Debian has it in the base system and a bunch of OS subsystems use it) so if you upgrade some library at the OS level it affects more than just your own code.

In part it is because Python has modules, not libraries. Modules can have executables bundled with them.

In part, it’s because Python modules packages don’t exactly care about backwards compatibility. By and large, people who put out packages will keep a changelog but not keep the same API going for years. So slightly different version of the same package can give you different results.

Lastly it is because Python does not have a clear definition of a project. If at the language level I could say “this is the root of my project. Do not use any OS-level libraries. Here are my library dependencies and they are to be stores in $PROJECT_ROOT/libs/“ then none of this would be required. I have rarely found that the version of Python itself was a big problem for me, but having the ability for multiple versions to be installed at the OS level combined with the above mechanism would entirely eliminate the need for virtual envs.

A key thing for Node is that NPM (in addition to being first-party and having a standard manifest format) installs all dependencies in the project directory, under node_modules

People complain about node_modules, but the benefit is that every Node project on a system is isolated automatically and can re-download all of its dependencies trivially (after cleaning them, or after being checked out onto a new machine)

A Node project's system-wide dependencies are:

1. A new-enough Node installation, and

...that's it

The same is true for Go and Rust and other modern languages. Python is the odd one out.

Go, Rust, and JS are unique in having package management solutions that prevents the diamond dependency problem. Most other languages suffer from the same problems as python in that they cannot have more than one version of the same package globally. The dependency resolution algorithms for Ruby/Dart/Julia are all NP class and require constraint solving which often fails to find a solution if your dependencies are complex.
For what it’s worth, the Poetry version solver works in the same way as Dart’s: https://github.com/python-poetry/poetry/blob/286f4ddb70394dd...
You can face the diamond problem in NPM, it just deals with it internally (usually by installing both versions under node_modules). But I don't see that as related to whether dependencies are project-specific or system-wide
To be fair, trying to figure out Maven and proper Java IDE integration (which is essential) can also be pretty miserable. Even though there aren't as many tool choices to deal with, the ones that exist can get really complicated and have poor documentation.
I would say that one important factor is the tooling. Instead of having one or two standard tools, as seen in Java (maven/gradle), Node (npm/yarn), .NET (dotnet CLI and MSBuild/maybe FAKE), Python has a plethora of opinionated tools. venv+pip is built-in, but doesn’t handle any of the “where to put the venv” parts, and doesn’t handle making packages installable (and the classic solution is setuptools). There are many competing projects, like pipenv (which does not work at all for libraries), poetry (which uses a non-standard way of specifying metadata and dependencies, and isn’t too friendly with the rest of the packaging world), flit, hatch, etc.

Why are there so many competing tools? Why can’t the community just pick one and make it a good tool for all use-cases? There’s an organization called “Python Packaging Authority”, and it maintains almost all tools mentioned in the previous paragraph (except the stdlib venv, and poetry). If they’re an authority, they should just say “___ is the way to go, we’ll add the missing use-cases to it, here’s a tool to migrate everything else to ___”. Instead, they have done things that support the proliferation of tools, like the PEP 517 (which is a standard API for package managers to talk to build tools).

Another deficiency in the Python package management system is the fact that you need virtual environments to get things done, and that they are often finicky. Some of the modern tools hide the venv from you, but this gets less practical if you’re trying to use things from within scripts, or trying to point your WSGI server at the venv, or in multi-user scenarios. Some of the modern tools put it in .venv and manage it for you, but venvs can randomly break in cases the tool might not be aware of (eg. in some system package managers, upgrading Python to a new minor version would cause venvs to break, because symlink targets moved.)

Node solves this by having a `node_modules` directory. .NET does stuff at build-time that most people don’t need to care about, and then just puts the .dll files for your dependencies next to the .dll with your thing, and the loader looks in the directory your code came from. Python has a proposal for `__pypackages__` [0], but it’s been there since May 2018 with not much progress.

[0] https://peps.python.org/pep-0582/

There's a couple of reasons for the current situation in terms of the plethora of tools.

One, Python and its packaging story is old (remember, Python predates Linux). That has given folks plenty of time to either come up with their own solutions since Python predates widespread internet usage (PyPI has not always been around).

Two, a lack of standards. Tying back into the "old" point, not everything was initially designed. There has been work to chip away at this and get more standards behind things, but getting folks to update their packaging code is *hard*; most people copy their packaging code from their last project and don't really try to update it. Getting changes to propagate through an ecosystem as large as Python's takes years; a decade is the typical time frame considered for complete uptake.

Three, the lack of standards means tools come up with their own solution which then isn't compatible with anyone else. That means when someone innovates, it can very quickly get locked up behind a single tool. That means folks end up reinventing the wheel for various reasons (e.g. lock files). Different approaches leads to different opinions, which leads to folks choosing different tools for different reasons. And when that happens, there ends up being a lack of consensus.

And four, this is almost entirely driven by volunteers with very few people paid in any way to work on this stuff (I think there might be like 2 people who contribute to pip, 2.5 folks for PyPI).

And good luck telling someone that their preferred workflow isn't the "chosen" workflow that the whole community is going to switch to. People don't like being told they are going to have to change, especially if they believe their approach is superior for whatever reason. Multiply that by the size of the Python community and you can see the pitchforks quite clearly on the horizon.

Now that isn't to say work isn't being done to improve the situation. The PyPA side of packaging has been working on standards for quite some time and is getting some traction with them (e.g. pyproject.toml is a great example of that). But we do have some more standards to work out. We are also regularly discussing how to come up with some singular tool/UX that people can get behind for most use cases, but see the above comments about the size of a challenge that it is and thus why it hasn't happened yet. But people are aware and trying to figure all of this stuff out.

I’m not exactly some hip and happening ‘on the inside’ developer, but I’m aware enough of the various Big Personalities in the Python community to be quite confident in my belief that politics played a role in PyPA not taking a more consistent stance.

Which is…not at all justification, certainly.

For JS, which I'm intepreti as node because I think that's the real parallel, the answer is NVN and NPM. Or alternatives such as yarn or pnpm. But even the default package manager is great compared to python.

I havent tried it but my understanding is poetry is a bit more like npm.