Hacker News new | ask | show | jobs
by renesd 3439 days ago
I think python packaging has gotten LOTS better in the last few years. I find it quite pleasurable to use these days.

From binary wheels (including on different linux architectures), to things like local caching of packages (taking LOTS of load off the main servers). To the organisation github of pypa [0], to `python -m venv` working.

Also lots of work around standardising things in peps, and writing documentation for people.

I would like to applaud all the hard work people have done over the years on python packaging. It really is quite nice these days, and I look forward to all the improvements coming up (like pipenv!).

I'd suggest people checkout fades [1] (for running scripts and automatically downloading dependencies in a venv), as well as conda [2] the alternative package manager.

[0] https://github.com/pypa/

[1] https://fades.readthedocs.io/en/release-5/readme.html#what-d...

[2] http://conda.pydata.org/docs/intro.html

3 comments

+1. Relatively to what we have before, it's so much better. But compared to the JS/Rust ecosystem, we are behind.

Now it's hard to compete with JS on some stuff : it's the only language in the most popular dev plateform (the web) and it has one implicit standardized async model by default.

It's hard to compete with rust on some stuff : it's compiled and is fast, can provide stand alone binaries easily and has a checker that can avoid many bugs.

But this. The package manager. We can compete. And yet we are late.

It's partially my fault since it's a project I had in mind for years and never took the time to work on. It's partially everybody's fault I guess :)

It was definitely your fault. Thanks for nothing.
:) In the open source world, and given my skills, I always feel a bit responsible when a tool doesn't exist.
I'd strongly disagree with Python being behind relative to JS. They're roughly on par (with Python being slightly ahead due to smarter caching and more deterministic builds). Unfortunately neither does namespacing nor good versioning, especially in the cache. Take for example Maven (from Java), it caches namespaced and versioned dependencies and is pretty reliably deterministic in the build process.

Verses node which makes horribly assumptions that allow dependencies of libraries to be resolved and dumped into the same library namespace as your application. It also allows users to use the shrinkwrap feature, which imposes the dependency's settings on me (say, for example, I want to use my own npmjs proxy, this is bypassed...)

Wow, Javascript, really? I am guessing you don't actually work with NPM a lot.
One suspects it's you who hasn't distributed or installed many modules on either python or node. So many of the problems that python has, simply don't exist for node, because it finds modules in a bottom-up hierarchical fashion. That allows a single app or module to use modules that in turn use different versions of other modules, and not to worry about what other modules are doing, or how other modules are installed, or how node is installed, or what version of node is installed. This prevents the traditional "dependency hell" that has plagued devs for decades. Thanks to tools like browserify and webpack, the browser may also benefit from this organization.

On top of all that, npm itself just does so many things right. It's quite happy to install from npm repos, from dvcs repos, from regular directories, or from anything that looks like a directory. It just needs to find a single file called "package.json". It requires no build step to prepare a module for upload to an npm repo, but it easily allows for one if that's necessary. package.json itself is basically declarative, but provides scripting hooks for imperative actions if necessary. At every opportunity, npm allows devs to do what they need to do, the easy way.

In a sense, node and npm are victims of their own quality. The types of "issues" (e.g. too many deps, too many layers of deps, too many versions of a particular dep, deps that are too trivial, etc.) about which anal code puritans complain with respect to node simply couldn't arise on other platforms, because dependency hell would cause the tower of module dependencies to collapse first. node happily chugs along, blithely ignoring the "problems".

Personally, I used to be able to build python packages for distribution, but since I've been spoiled by node and npm for several years I've found I simply can't do that for python anymore. It is so much harder.

npm has its own special problems. disclaimer: what I'm talking about in this post is at least six months old, which in node/npm/js world is ancient history.

> it finds modules in a bottom-up hierarchical fashion. That allows a single app or module to use modules that in turn use different versions of other modules, and not to worry about what other modules are doing

To my understanding, if your app transitively depends on package foo-1.2 in thirty different places [0], there will be thirty copies of foo-1.2 on disk under node_modules/ . Each package reads its very own copy of foo-1.2 when it require()s foo.

On a large app, that adds up to a lot of inodes ("why does it say my filesystem is full? there's only 10G of stuff on my 80G partition!" because it's used up all its inodes, not its bytes.) and a _lot_ of unnecessary I/O. The second through thirtieth copies of foo-1.2 don't come from the kernel's block cache "for free", they come from spinning rust (or if you're lucky, the dwindling number of IOps your SSD can choke out. Do you pay money for provisioned IOps?).

[0] and thirty is a lowball number for some projects, especially given the community's preference to require "leftpad" or whatever instead of writing a couple lines in their own projects

...what I'm talking about in this post is at least six months old...

Haha npm@3 was out June 2015. b^)

I agree that it would have been better, on balance, for previous versions to have created hard links to already-installed modules. Actually that wouldn't be a bad option to have even now, since debugging is often easier when one has a deep directory structure to explore rather than hundreds of random names in the top-level node_modules directory. That is, if I know the problem is in module foo, I can pushd to node_modules/foo, find the problematic submodule again, and repeat until I get all the way to the bottom. [EDIT: it occurs to me that having all these hard links would make e.g. dependency version updates easier, since un-updated dependencies wouldn't have to be recopied, unix-stow-style.]

To me, the more amusing file descriptor problem is caused by the module "chokidar", which when used in naive fashion tries to set up watches on all 360 files and directories created by itself and its own 55 dependencies. At that point it's real easy to run out of file watches altogether. Some of the utilities that call chokidar do so while ignoring node_modules, but many do not.

Your concern is somewhat ancient history in the node/npm world. NPM >= 3 flattens the dependency hierarchy as much as it thinks possible. (3.0.0 was released to beta in Jun 2015; NPM >= 3 has been bundled with Node LTS since "Boron" in Oct 2016)

Subsequent versions of NPM continue to improve upon this flattening effort.

I work with NPM everyday, and it works swimmingly. npm install packagename --save, doesn't get much easier than that.
This is actually one of the big problems, I think: Python packaging involves knowing a number of different things and reading various resources to get the full picture.

Recently, I built a small CLI tool in Python, and learned all of the bits needed to build, test and package my application "the right way". I knew Python syntax before, but it was a lot of effort to set this up. The difference in the experience between Python and Rust or .NET Core is actually shocking, and most it isn't down to anything that Python couldn't do, just the current state of the tooling.

Could you provide some specific examples of the "shocking" difference?
Python best practice: figure out the correct directory structure by reading docs and looking at GitHub repositories, learn how to write setup.py & setup.cfg & requirements.txt & MANIFEST.in files, setup py.test and tox (because Python 2 still lives), write your README in RST format (as used by nothing else ever), and for bonus points: write your own Makefile. Get depressed when you realize that target platforms either don't have Python or have the wrong version.

Rust: type "cargo new", README and doc comments in Markdown, type "cargo test" and "cargo build".

I'm being deliberately snarky, but you get the point: there has been a slow accretion of complexity over a very long time, and most of it is not the language itself.

Thank you. It is exactly what I have asked for.

The Python best practice: you do not need "correct directory structure" to start programming in Python--there is no boilerplate structure to create (and that would require a tool): REPL, Jupyter notebook in a browser, a single file (module) is enough in a simple case.

There is no single "correct directory structure" for more complex cases either e.g., a Django web app and Ubuntu's Appindicator would probably have different structures and it is likely that there are tools that create corresponding project skeletons such as "django-admin startproject mysite".

Usually, the packaging/deploying are not nice & neat and differ depending on your project, chosen platform--either you follow an example or you have to study a never ending list of incomplete/contradicting standards/semi-documented rules & conventions--it is unrelated to Python (even if tools such as Ansible, Salt are implemented in Python). Naturally, there are things that can be improved.

btw, reStructuredText is a great text format for documenting Python projects (it scales from a simple README.rst to a whole book). You can always convert README.md written in your favorite Markdown flavour using pandoc or a similar tool. Markdown is great for a short human-generated plaintext content common on Github, Stack Overflow.

> there is no boilerplate structure to create (and that would require a tool): REPL, Jupyter notebook in a browser, a single file (module) is enough in a simple case.

Rhetorical question: is Python a scripting language or an applications language?

If it's a scripting language, then yes, a single file is all that you need. Otherwise, you need a directory structure with various support files to be able to create and build your CLI tool or library, and you need to follow conventions if you are going to publish your work as Open Source. Which in turn requires you to know which tools to use, and what they require.

Go raised the bar here, I think. Rust and .NET Core both follow the Go idea of having one top level tool that gives you what you need to test and package your library or CLI application. That means that new users can get success with at least two basic use cases very rapidly and with little effort. Which in turn means that the previous status quo is no longer so tolerable.

> btw, reStructuredText is a great text format for documenting Python projects (it scales from a simple README.rst to a whole book).

Sure, the big problem with RST is that it lost the fight. Every developer who uses GitHub has to know a little Markdown, not RST or something else. For larger documents, AsciiDoc is probably the thing. I used to be a Textile loyalist, myself, but stopped fighting against the tide and switched to Markdown some years ago.

Python is a general -purpose programming language. No size fits all.

You've ignored "in a simple case" and ignored the next paragraph that discussed "more complex cases." If I have a task: publish Ubuntu PPA with a package that does X, or deploy a web app on a PaaS, or add a component that massages some data for a numerical simulation on a cluster or something else. I do not see what Go, Rust, .NET do here that Python can't.

Obviously, there could be specific use-cases where some language/framework has advantages over other languages/frameworks and (as I said) there may be use-cases that can be improved that why they should be named explicitly -- that is why I've asked in the first place -- so far I don't see "shoking difference" in the general case (I don't doubt that you had experienced the shoking difference for your specific (unnamed) task).

`setup.py` is shockingly awful compared to most other solutions.
Conda is great software but the Conda/Anaconda ecosystem is a mess.
Can you elaborate?