Hacker News new | ask | show | jobs
by PaulHoule 999 days ago
I worked at a place where we had machine learning systems with a big pile of dependencies that pip could not consistently resolve, I figured out what most of the technical problems where but I was still struggling with wetware problems and they eventually put me on a Scala/Typescript project instead.

One big problem is that pip just starts downloading and installing things optimistically, it does not get a global view of the dependencies and if it finds a conflict it can't reliably back out from where it is and find a good configuration. The answer is to do what maven does or what conda does and download the dependency graph of all the matching versions and get a solve before before you start downloading. Towards the end of my time on that project I had built something that assembled a "wheelhouse" of wheels necessary to run my system and would install them directly.

What I figured out was that you could download just the dependencies from a wheel with 2 or 3 range requests because a wheel is just a ZIP file and you can download the header and the directory from the end of the file and then know where the metadata is and download just that. Recently pypi got some sense and now they let you download just the metadata.

And that's the story of Python packaging. Things are really going in the right direction but progress has been slow because the community has mistaken "98% correct" (e.g. wrong) with "has 98% of the features somebody might want" It might have been a lot better if somebody with some vision and no tolerance for ambiguity had gotten in charge a long time ago.

1 comments

A new dependency resolver was introduced [0] with pip 20.3 (in 2020) which sounds like it's meant to address the problem you're describing. Were you using an earlier version of pip or is this still a problem?

[0] https://pip.pypa.io/en/latest/user_guide/#changes-to-the-pip...