Hacker News new | ask | show | jobs
by cderwin 3431 days ago
This is great, but sometimes I think that python needs a new package manager from scratch instead of more tools trying to mix and mash a bunch of flawed tools together in a way that's palatable by most of us. Python packaging sucks, the whole lot of it. Maybe I'm just spoiled by rust and elixir, but setuptools, distutils, pip, ez_install, all of it is really subpar. But of course everything uses pypi and pip now, so it's not like any of it can actually be replaced. The state of package management in python makes me sad. I wish there was a good solution, but I just don't see it.

Edit: I don't mean to disparage projects like this and pipfile. Both are great efforts to bring the packaging interface in line with what's available in other languages, and might be the only way up and out of the current state of affairs.

7 comments

I think python packaging has gotten LOTS better in the last few years. I find it quite pleasurable to use these days.

From binary wheels (including on different linux architectures), to things like local caching of packages (taking LOTS of load off the main servers). To the organisation github of pypa [0], to `python -m venv` working.

Also lots of work around standardising things in peps, and writing documentation for people.

I would like to applaud all the hard work people have done over the years on python packaging. It really is quite nice these days, and I look forward to all the improvements coming up (like pipenv!).

I'd suggest people checkout fades [1] (for running scripts and automatically downloading dependencies in a venv), as well as conda [2] the alternative package manager.

[0] https://github.com/pypa/

[1] https://fades.readthedocs.io/en/release-5/readme.html#what-d...

[2] http://conda.pydata.org/docs/intro.html

+1. Relatively to what we have before, it's so much better. But compared to the JS/Rust ecosystem, we are behind.

Now it's hard to compete with JS on some stuff : it's the only language in the most popular dev plateform (the web) and it has one implicit standardized async model by default.

It's hard to compete with rust on some stuff : it's compiled and is fast, can provide stand alone binaries easily and has a checker that can avoid many bugs.

But this. The package manager. We can compete. And yet we are late.

It's partially my fault since it's a project I had in mind for years and never took the time to work on. It's partially everybody's fault I guess :)

It was definitely your fault. Thanks for nothing.
:) In the open source world, and given my skills, I always feel a bit responsible when a tool doesn't exist.
I'd strongly disagree with Python being behind relative to JS. They're roughly on par (with Python being slightly ahead due to smarter caching and more deterministic builds). Unfortunately neither does namespacing nor good versioning, especially in the cache. Take for example Maven (from Java), it caches namespaced and versioned dependencies and is pretty reliably deterministic in the build process.

Verses node which makes horribly assumptions that allow dependencies of libraries to be resolved and dumped into the same library namespace as your application. It also allows users to use the shrinkwrap feature, which imposes the dependency's settings on me (say, for example, I want to use my own npmjs proxy, this is bypassed...)

Wow, Javascript, really? I am guessing you don't actually work with NPM a lot.
One suspects it's you who hasn't distributed or installed many modules on either python or node. So many of the problems that python has, simply don't exist for node, because it finds modules in a bottom-up hierarchical fashion. That allows a single app or module to use modules that in turn use different versions of other modules, and not to worry about what other modules are doing, or how other modules are installed, or how node is installed, or what version of node is installed. This prevents the traditional "dependency hell" that has plagued devs for decades. Thanks to tools like browserify and webpack, the browser may also benefit from this organization.

On top of all that, npm itself just does so many things right. It's quite happy to install from npm repos, from dvcs repos, from regular directories, or from anything that looks like a directory. It just needs to find a single file called "package.json". It requires no build step to prepare a module for upload to an npm repo, but it easily allows for one if that's necessary. package.json itself is basically declarative, but provides scripting hooks for imperative actions if necessary. At every opportunity, npm allows devs to do what they need to do, the easy way.

In a sense, node and npm are victims of their own quality. The types of "issues" (e.g. too many deps, too many layers of deps, too many versions of a particular dep, deps that are too trivial, etc.) about which anal code puritans complain with respect to node simply couldn't arise on other platforms, because dependency hell would cause the tower of module dependencies to collapse first. node happily chugs along, blithely ignoring the "problems".

Personally, I used to be able to build python packages for distribution, but since I've been spoiled by node and npm for several years I've found I simply can't do that for python anymore. It is so much harder.

npm has its own special problems. disclaimer: what I'm talking about in this post is at least six months old, which in node/npm/js world is ancient history.

> it finds modules in a bottom-up hierarchical fashion. That allows a single app or module to use modules that in turn use different versions of other modules, and not to worry about what other modules are doing

To my understanding, if your app transitively depends on package foo-1.2 in thirty different places [0], there will be thirty copies of foo-1.2 on disk under node_modules/ . Each package reads its very own copy of foo-1.2 when it require()s foo.

On a large app, that adds up to a lot of inodes ("why does it say my filesystem is full? there's only 10G of stuff on my 80G partition!" because it's used up all its inodes, not its bytes.) and a _lot_ of unnecessary I/O. The second through thirtieth copies of foo-1.2 don't come from the kernel's block cache "for free", they come from spinning rust (or if you're lucky, the dwindling number of IOps your SSD can choke out. Do you pay money for provisioned IOps?).

[0] and thirty is a lowball number for some projects, especially given the community's preference to require "leftpad" or whatever instead of writing a couple lines in their own projects

...what I'm talking about in this post is at least six months old...

Haha npm@3 was out June 2015. b^)

I agree that it would have been better, on balance, for previous versions to have created hard links to already-installed modules. Actually that wouldn't be a bad option to have even now, since debugging is often easier when one has a deep directory structure to explore rather than hundreds of random names in the top-level node_modules directory. That is, if I know the problem is in module foo, I can pushd to node_modules/foo, find the problematic submodule again, and repeat until I get all the way to the bottom. [EDIT: it occurs to me that having all these hard links would make e.g. dependency version updates easier, since un-updated dependencies wouldn't have to be recopied, unix-stow-style.]

To me, the more amusing file descriptor problem is caused by the module "chokidar", which when used in naive fashion tries to set up watches on all 360 files and directories created by itself and its own 55 dependencies. At that point it's real easy to run out of file watches altogether. Some of the utilities that call chokidar do so while ignoring node_modules, but many do not.

Your concern is somewhat ancient history in the node/npm world. NPM >= 3 flattens the dependency hierarchy as much as it thinks possible. (3.0.0 was released to beta in Jun 2015; NPM >= 3 has been bundled with Node LTS since "Boron" in Oct 2016)

Subsequent versions of NPM continue to improve upon this flattening effort.

I work with NPM everyday, and it works swimmingly. npm install packagename --save, doesn't get much easier than that.
This is actually one of the big problems, I think: Python packaging involves knowing a number of different things and reading various resources to get the full picture.

Recently, I built a small CLI tool in Python, and learned all of the bits needed to build, test and package my application "the right way". I knew Python syntax before, but it was a lot of effort to set this up. The difference in the experience between Python and Rust or .NET Core is actually shocking, and most it isn't down to anything that Python couldn't do, just the current state of the tooling.

Could you provide some specific examples of the "shocking" difference?
Python best practice: figure out the correct directory structure by reading docs and looking at GitHub repositories, learn how to write setup.py & setup.cfg & requirements.txt & MANIFEST.in files, setup py.test and tox (because Python 2 still lives), write your README in RST format (as used by nothing else ever), and for bonus points: write your own Makefile. Get depressed when you realize that target platforms either don't have Python or have the wrong version.

Rust: type "cargo new", README and doc comments in Markdown, type "cargo test" and "cargo build".

I'm being deliberately snarky, but you get the point: there has been a slow accretion of complexity over a very long time, and most of it is not the language itself.

Thank you. It is exactly what I have asked for.

The Python best practice: you do not need "correct directory structure" to start programming in Python--there is no boilerplate structure to create (and that would require a tool): REPL, Jupyter notebook in a browser, a single file (module) is enough in a simple case.

There is no single "correct directory structure" for more complex cases either e.g., a Django web app and Ubuntu's Appindicator would probably have different structures and it is likely that there are tools that create corresponding project skeletons such as "django-admin startproject mysite".

Usually, the packaging/deploying are not nice & neat and differ depending on your project, chosen platform--either you follow an example or you have to study a never ending list of incomplete/contradicting standards/semi-documented rules & conventions--it is unrelated to Python (even if tools such as Ansible, Salt are implemented in Python). Naturally, there are things that can be improved.

btw, reStructuredText is a great text format for documenting Python projects (it scales from a simple README.rst to a whole book). You can always convert README.md written in your favorite Markdown flavour using pandoc or a similar tool. Markdown is great for a short human-generated plaintext content common on Github, Stack Overflow.

> there is no boilerplate structure to create (and that would require a tool): REPL, Jupyter notebook in a browser, a single file (module) is enough in a simple case.

Rhetorical question: is Python a scripting language or an applications language?

If it's a scripting language, then yes, a single file is all that you need. Otherwise, you need a directory structure with various support files to be able to create and build your CLI tool or library, and you need to follow conventions if you are going to publish your work as Open Source. Which in turn requires you to know which tools to use, and what they require.

Go raised the bar here, I think. Rust and .NET Core both follow the Go idea of having one top level tool that gives you what you need to test and package your library or CLI application. That means that new users can get success with at least two basic use cases very rapidly and with little effort. Which in turn means that the previous status quo is no longer so tolerable.

> btw, reStructuredText is a great text format for documenting Python projects (it scales from a simple README.rst to a whole book).

Sure, the big problem with RST is that it lost the fight. Every developer who uses GitHub has to know a little Markdown, not RST or something else. For larger documents, AsciiDoc is probably the thing. I used to be a Textile loyalist, myself, but stopped fighting against the tide and switched to Markdown some years ago.

`setup.py` is shockingly awful compared to most other solutions.
Conda is great software but the Conda/Anaconda ecosystem is a mess.
Can you elaborate?
Okay - maybe I'm missing something, but pip is the only Python package manager I've ever used. And it's basically "pip install xxx", "pip install --upgrade xxx", "pip show xxx", "pip list", "pip uninstall xxx"

I'm curious what I've been missing about pip that makes it problematic - I've never used the other tools you mentioned (setuptools/distutils/ez_install) - so I can't comment on them, but, on the flip side, I've never had to use them, so maybe my requirements are somewhat more simple than yours.

One things is a good dependency management. Right now if you want to upgrade your Python version, or one of your packages, it's a mountain of manual work. There is nothing in the stack helping you with the dependency graph.

Another thing is providing a stand alone build. Something you can just ship without asking the client to run commands in the terminal to make it work. I use nuikta (http://nuitka.net/) for this. It's a fantastic project, but man it's a lot of work for something that works out of the box in Go or Rust.

One last thing is to generate packages for OS (msi/deb/rpm/dmg/snap). Your sysadmin will like you. Pex (https://pypi.python.org/pypi/pex) is the closest, but not very standard.

Other pet peeves of mine:

- you can't easily move virtualenvs;

- creating a setup.py is very hard for a beginner and has numerous traps;

- setup.py are executable files. Meh.

- what's with this setup.cfg containing 2 lines ? And the MANIFEST.IN being a separate files. Why do I have to put conf also in tox.ini ? And one for each of my linters ? I want ONE setup.cfg file with all config for all tools for my project inside and be done with it. TOML can handle rich sections, just stop creating new files.

- accessing file with pkg_resources() is way harder than it should be. I made a wrapper for this (http://sametmax.com/embarquer-un-fichier-non-python-propreme...).

- one place to have __version__, please. I want it readable in my code AND in my package metadata, without having to use regex or have side effects on imports.

- remove the "can't build wheel message" when it's useless. It scares newcomers.

- README is the long_description. Don't make me read it manually.

- how do I provide vendors in a clean way ?

- install_requires, extras_requires, setup_requires, tests_requires... Make it one require with hooks and tags and be done with it.

- creating a setup.py test config is way harder than it should be and breaks in CI on strange edge cases.

- can we get a PEP on the standard project structure and built it in our tools to be done with it? We all have src/package + setup.py on root anyway.

- pip installs packages in the site-packages dir of the python executable it's installed for. It makes sense, and I think Python deals pretty well with the fact you can have various versions installed on the same machine. But God people are confused by this. Now you can recommend to do "python -m pip", but it's very verbose and it assumes people know what version of Python is behind the "python" executable. On windows it can be any, and they must chose with... yet another command ("py")! pipenv just bypass that by assuming you want a virtualenv, and be able to access it. It's a very good call.

- pip install --user will create commands you can't use unless you edit your PATH. This makes newcomers go mad.

Oh my god, you've described every single one of my issues with Python packaging.

The whole setup.py/setup.cfg situation really is ridiculous. Having to import the __version__, read() the README, no markdown support on pypi, MANIFEST / MANIFEST.in files, tox.ini, what a mess.

This. Particularly the need for a minimum standard project structure.

Pipenv shows its pedigree and looks like a great tool...that also overlaps significantly with conda. What are the use cases that Pipenv addresses better than/in lieu of conda?

It looks like Pipenv does not handle the python install itself or associated non-python libraries. With Conda I can tell it to install Python 3.6 along with FreeTDS (for mssql). Conda lets me do this in one environment.yml file and have it work cross platform. Separate homebrew or apt-get steps are no longer necessary.

That said pipenv still looks awesome. Any improvement to the python packaging world is welcome gift.

You don't need to install (ana|mini)conda just to get a package manager, would be why I would use Pipenv over Conda. Miniconda alone requires somewhere close to 400MB of space and comes with a whole bunch of extra things I don't need just to manage packages and virtualenvs.
The miniconda bootstrap of conda is ~20-30 MB (compressed) depending on platform. It contains only conda and its dependencies, like python and requests. It's how you install conda if you want only conda. The 400 MB number is for the Anaconda Distribution, which is a self contained, single-install, get-all package primarily aimed at scientists and engineers.
Don't you mean 45MB windows, and 21 MB OSX?

https://repo.continuum.io/miniconda/

pipenv allow you to completly ignore the virtualenv. Like node_packages. It seems a detail, but giving a lot of python and js trainings, I came to realize newcomers needs little help like this.
I kind of want to take your list and write a tool that fixes (or, at least, automatically works around) all of these issues. Good job.
It actually could be several projects:

- one for graph deps;

- one for packaging;

- one for managing your project.

Then you can let people like Kenneth build a big friendly wrapper on top of it.

The only problem with those is that it must be pure Python. Otherwise you will have problems. Wheel are not bullet proof, and while you can get away with tinkering for dependencies, you can't with your package manager. It should work out of the box.

But creating "cargo for Python" is a very, very hard job. And nobody will remember you for it.

I agree with almost all of this, but...

> - you can't easily move virtualenvs;

`virtualenv --relocatable`, though it's weird that it's not the default, yes.

Note: you must run this after you've installed any packages into the environment. If you make an environment relocatable, then install a new package, you must run virtualenv --relocatable again
Nice list! We manage to avoid most problems with distribution by using Docker containers, but it brings its own set of problems and downsides. I would love to have a better solution!

  you can't easily move virtualenvs
I'm not sure whether it qualifies as easy, but you can use virtualenv-mv

https://github.com/brbsix/virtualenv-mv

Honest question: have you seen how the Perl world handles this stuff? Now that I've mostly moved over to Python, the Perl experience (overall package mgmt) seems much, much better.

Note, it doesn't feel terrible in Python land, to me at least. But it was almost a joy working with Perl's packaging system.

No. Does it build a dependency graph and output your the best path to upgrades ?
It does build a dependency graph...But I'm not sure what you mean by 'best' path to upgrade.
Choosing the combination of the most up to date compatible lib versions, or tell you can't update and let you know the conflict source.
>One things is a good dependency management. Right now if you want to upgrade your Python version, or one of your packages, it's a mountain of manual work. There is nothing in the stack helping you with the dependency graph.

There's pip-tools.

pip-tools doesn't solve the problem at all. It will update things to the last up to date version, cascading from package to package.

That doesn't guaranty your setup will work.

Dependency management suppose to create a graph of all requirements, lower and upper versions bound for the runtime and the libs, and find the most up to date combination of those.

If a combination can't be found, it should let you know that either you can't upgrade, or suggest alternative upgrade paths.

pip-tools will just happily upgrade your package and let you with something broken, because it's based on pip which does that. They don't check mutually exclusive dependencies versions, deprecation, runtime compatibility and such. And they don't build a graph of their relations.

It would be even better if the tool ran your project's tests when checking upgrade combinations.

Something that would say: "You can safely upgrade to Django 1.9.12. Upgrading to latest Django 1.10.5 breaks 20 tests."

How can you have an upper bound on compatibility? When a library is released, it knows that it works with version 1.3.2 with its dependency, but how can it ever know it doesn't work with 1.4, unless the developer goes back and re-releases the app?
If the library follows semantic versioning, then you can always declare that you work with everything from the current version to before the next major version.
Python 2.7 <=> Python 3
>That doesn't guaranty your setup will work.

Nothing guarantees your setup will work.

Nice list of warts.
Pip is built on top of setuptools/distutils, so you are using them without even knowing it.
Not atypical; I'm pretty sure apt uses dpkg, and dnf and yum use rpm underneath.
I strongly agree but this can work now and is a big improvement over what we currently have. So while I would literally pay to see somebody work on a better package manager (which can generate exe, deb and use a conf file indead of .py), this is a good filler.
Why should it generate a deb file? How is this useful on anything but Debian based systems?

Why exe? How do you package libraries using this new tool you are envisioning?

You do package lib the same way as before, although cargo like dependency handling would be a nice thing. Especially for upgrades.

But a good package manager should ALSO allow you to produce a:

- a stand alone executable for most OS.

- a standard package for major OS (msi, deb, snap, rpm, dmg, etc).

Doing that right now with Python requires you to setup stuff like nuikta and the like. It works but it's much harder than it should be.

debhelper pretty much automates the process of packaging any standard distutils or setuptools package, Red Hat distributions have templates for packaging Python libraries as well (and rpmdev-newspec python-mypackage will automatically generate an appropriate .spec file).

Windows and OS X are always a pain in the ass, but that's more an issue of the platforms lacking in package management than anything else.

See my point ? There is a way to do it, it's just a pain.

Now pipenv centralize stuff we were doing anyway.

We should have a tool to centralize those as well.

I would hardly call it a pain, it takes me all of 3 minutes to write a .spec for most python packages and from there it's basically 'tito release'. Sure, if I wanted to package for debian-based distributions it'd take a little more time, but it's worth it to make a quality package that a distribution itself can decide to pick up (packagers love other people doing the work for them, though they won't refuse doing it themselves) with minimal effort.
Just curious, what aspects of pip/virtualenv specifically do you find subpar in comparison to other languages' package managers.
I would look at this comment[0] by sametmax for a critique of pip. My main gripe with virtualenv is that it's required at all: other interpreted languages, like node and elixir for example, have figured out how to handle non-global dependencies without a third-party package. Beyond that, it's frustrating to deploy because its non-relocatable (in our build/deploy scripts at my last python job we had to use sed all over the place to fix paths), and I find it semi-annoying to have a bunch of different copies of interpreter and all that goes with it (though this is mostly a minor annoyance -- it doesn't take up that much space and it doesn't matter if it gets out of sync.

Also notable, IMO, is the lack of a tool like rbenv or rustup for python. I can't tell you how many times I have had to try to figure out which python version a given pip executable worked with.

[0] https://news.ycombinator.com/item?id=13460490

> like node [...] have figured out how to handle non-global dependencies

Node would be the last place I'd look for a good solution in. Not sure if there was some progress recently, but it was hell some time back. Modules were huge, taking thousands of other modules with them, majority of those being duplicates. There was no deduplication, no version wildcards I believe either. It wouldn't even work with some tools because the path would end up being hundreds of characters long.

Since npm 3 (about 18 months ago), the node_modules dir tree is now 'flat' and de-duped (where possible).

There have always been version wildcards as far as I know. Long paths caused by the deeply nested tree were a problem in Windows only, addressed (I believe, I can't find open issues on it) by the flattening in npm 3.

We've spotted someone who uses an OS with arbitrary path limitations...
> Also notable, IMO, is the lack of a tool like rbenv or rustup for python

Does pyenv not meet your needs there?

Oh cool, I actually hadn't seen pyenv before. Looks like it does indeed solve my problems (from a glance anyway, though I didn't see anything about pip in the readme).
`pyenv which pip` would be the command that answers the specific point you mentioned :). That also works for any bin that gets shimmed by pyenv.

It also has plugins to automatically work with venv, if you don't mind some 'magic' in your workflow.

Overall it's a solid setup.

>it's frustrating to deploy because its non-relocatable

I've tried relocating node_modules. It's a recipe for pain and tears.

I don't see why it's a big problem that virtualenv is a package rather than built in.

I also haven't had much of a problem with virtualenv not being relocatable. If you want it somewhere else, just build it there.

>Also notable, IMO, is the lack of a tool like rbenv

Ummmmm the creator of rbenv also created pyenv.

> Beyond that, it's frustrating to deploy because its non-relocatable (in our build/deploy scripts at my last python job we had to use sed all over the place to fix paths)

pip does cache the wheels so instead of moving the virtualenvs around, just recreate them. This also ensures the virtualenv is up to date. Using tox this is fairly easy to do.

Sure virtualenv is a bit of a hack but it's not that bad.

I'm mostly talking about moving between different machines. I would like to be able to tar up my source code and venv, distribute it to multiple machines, untar it and run it. However that's not possible with virtualenv unless you do a lot of hackery in your build. In particular, creating a virtualenv on each server during deploy is not an option.
> My main gripe with virtualenv is that it's required at all: other interpreted languages, like node and elixir for example, have figured out how to handle non-global dependencies without a third-party package.

venv is in the stdlib since 3.3. (Though I agree with the annoyance at the need.)

pyenv
I actually really love `Cargo` : the rust package manager.
All you've said here is "python packaging sucks", with no explanation why, and with no alternative. Not a substantial comment, and I'm disappointed that it's been voted to the top.

I'd ask for your reasoning but it seems sametmax has done a good job of that for you:

https://news.ycombinator.com/item?id=13460490

Comments that make a single unambiguous point are fine. It's no problem to leave detailed support to replies.

A more controversial statement with the same content probably wouldn't have been voted up, but thread parent is objectively true even though it doesn't contain its own proof.

I don't disagree but your suggestion sort of reminds me of this xkcd: http://xkcd.com/927/
"This is great, but sometimes I think that python needs a new package manager from scratch"

Ha, that's what I came here to say!

Or better - a new packaging paradigm.

Maybe it's extremely uncool to say ... but I think Java still has the best packaging paradigm of all languages. Jars rule. Of course 'gradle' is kind of a confusing mess so they don't have dependencies worked out very well ...

Nevertheless I do feel that Python's packaging and dependency/versioning woes create a much bigger systematic problem than many realize.

Kudos to the author though ...