Hacker News new | ask | show | jobs
by simonw 244 days ago
It's mostly about age. Python has been around for 35 years now. The first version of a Python package directory was the cheeseshop (Monthy Python reference) in 2003. The earliest version of a pip-like tool was "easy_install" which - I kid you not - worked by scraping the HTML listing page of the cheeseshop and downloading zip files linked from that!

More recent languages like Node.js and Rust and Go all got to create their packaging ecosystems learning from the experiences of Perl and Python before them.

There is one part of Python that I consider a design flaw when it comes to packaging: the sys.modules global dictionary means it's not at all easy in Python to install two versions of the same package at the same time. This makes it really tricky if you have dependency A and dependency B both of which themselves require different versions of dependency C.

5 comments

I think it's also from trying to keep with the old paradigm of "libraries are installed and managed globally, potentially as linkable object files."

All the languages of today gain all their improvements from:

1. Nothing should be global, but if it is it's only a cache (and caches are safe to delete since they're only used as a performance optimization)

2. You have to have extremely explicit artifact versioning, which means everything needs checksums, which means mostly reproducible builds

3. The "blessed way" is to distribute the source (or a mostly-source dist) and compile things in; the happy path is not distributing pre-computed binaries

Now, everything I just said above is also wrong in many aspects or there's support for breaking any and all of the rules I just outlined, but in general, everything's built to adhere to those 3 rules nowadays. And what's crazy is that for many decades, those three rules above were considered absolutely impossible, or anti-patterns, or annoying, or a waste, etc (not without reason, but still we couldn't do it). That's what made package managers and package management so awful. That's why it was even possible to break things with `sudo pip install` vs `apt install`.

Now that we've abandoned the old ways in e.g. JS/Rust/Go and adopted the three rules, all kinds of delightful side effects fall out. Tools now which re-build a full dependency tree on-disk in the project directory are the norm (it's done automatically! No annoying bits! No special flags! No manual venv!). Getting serious about checksums for artifacts means we can do proper versioning, which means we can do aggressive caching of dependencies across different projects safely, which means we don't have to _actually_ have 20 copies of every dependency, one for each repo. It all comes from the slow distributed Gentoo/FreeBSD-ification of everything and it's great!

If, and only if, you have actual reproducible builds, you can distribute pre-compiled binaries as a cache optimization. That can allow for speedups without necessarily compromising security. It's also a prerequisite for a lot of "supply chain" security processes which are becoming increasingly desirable.
On a tangent, the somewhat related issue of Python 3 not being able to import Python 2 packages famously led Zed Shaw of "Learn Python the Hard Way" to write a rant about how Python is not Turing Complete. I checked again and apparently he removed that rant and only has a disclaimer in its place mentioning that he was obviously being hyperbolic [0].

[0] https://learnpythonthehardway.org/book/nopython3.html#the-py...

I don’t think anyone takes Zed Shaw seriously.

He’s like that uncle you see at family gatherings whom you nod along politely to.

Zed Shaw seems to have some very interesting beliefs about the 2->3 migration in general. I think it's fair to call some of it conspiratorial.
Indeed that was a weird time, but he did eventually relent and release a version for Python 3 - https://learncodethehardway.com/client/#/product/learn-pytho...
> There is one part of Python that I consider a design flaw when it comes to packaging: the sys.modules global dictionary means it's not at all easy in Python to install two versions of the same package at the same time. This makes it really tricky if you have dependency A and dependency B both of which themselves require different versions of dependency C.

But it solves the problem that if A and B both depend on C the user can pass an object from A to B that was created by C without worrying about it breaking.

In less abstract terms, let's say numpy one day changed it's internal representation of an array, so if one version of numpy read an array of a different version of numpy it would crash or worse read it but misinterpret it. Now if I have one data science library produces numpy arrays and another visualization library that takes numpy arrays, I can be confident that only one version of numpy is installed and the visualization library isn't going to misinterpret the data from the data because it is using a different version of numpy.

This stability of installed versions have allowed entire ecosystems build around core dependencies in a way that would be tricky without that. I would therefore not consider it a design flaw.

I wouldn't mind a codebase where numpy objects created by dependency B can't be shared directly with dependency A without me first running some kind of conversion function on them - I'd take that over "sorry you want to use dependency A and dependency B in this project, you're just out of luck".
> I wouldn't mind a codebase where numpy objects created by dependency B can't be shared directly with dependency A without me first running some kind of conversion function on them

Given there's no compiler to enforce this check, and Python is dynamic language, I don't see how you implement that without some complicated object provenance feature, making every single object larger and every use of that object (calling with it, calling it, assigning it to an attribute, assigning an attribute to it) impose an expensive runtime check.

But maybe I'm missing something obvious.

You let people make the mistake and have the library throw an exception if they do that, not through type checking but just through something eventually calling a method that doesn't exist.
> You let people make the mistake and have the library throw an exception if they do that, not through type checking but just through something eventually calling a method that doesn't exist.

Exceptions or crashes would be annoying, but yes, are manageable, although try telling that to new users of the language that their code doesn't work because they didn't understand the transitive dependency tree of their install and it automatically vendored different versions of a library for different dependencies, and how did they not know that from some random exception occurring in a dependency.

But as I explain in my example, the real problem is that one version of the library reads the data in a different layout from the other, so instead you end of with subtle data errors. Now your code is working but your getting the wrong output, good luck debugging that.

Man, there was a window there where I still fell back to easy_install on Windows because it would handle C based stuff more reliably until wheels got invented. It’s been a journey.
35 years is misleading. Python existed, yes, but was very different. e.g. Pandas was released in 2008. Most use packages much more recent than that. 35 years ago Perl was faster than Python and had deep adoption (through 2007? or so)
How is it misleading?

The question is why Python packaging has such a complicated history. The age of the language is entirely relevant to that - the reason Go and Rust have it so good here is that they are much younger, coming out after may of the initial packaging lessons had been learned elsewhere.

it is misleading because I was around 35 years ago, and very few people were using Python. Python did not become very popular until web frameworks and pandas became a thing in python.
if you doubt this, ask any llm "what was the first year where Python users surpassed the number of Perl users?"
I still don't understand why that makes what I wrote "misleading". I never said Python was popular 35 years, I just said that the age of the language was relevant to understanding why the packaging history is complex.