Hacker News new | ask | show | jobs
by iakov 1438 days ago
IMO it's not DL libraries, it's Python. Python sucks at managing dependencies. It's a hilarious mess of pipenv, prose, conda, vex, pex, shmex and god knows what else is hot now. It seems that every time I want to write a simple Python utility, there is a new way to install and track dependencies.
2 comments

I always see this and I wonder what other people are doing wrong. pip and a requirements.txt has worked for over a decade (along with virtualenv.) Yes, you may run into issues with larger projects and need something more complex. However, if you just want a list of dependencies and how to install them for a "simple" standalone utility like you describe, pip is the way to go.
It's Python and the people using it - in theory pip + requirements.txt + virtualenv/conda should work, but here's over 1 million issues of people asking for requirements files or fixes to requirements files: https://github.com/search?q=requirements+file&type=issues

PS: For ML/DL enthusiasts, there is a small Ruby DL scene with some nice ported libraries done by the amazing @ankane - super infant right now, but if more folks use it, maybe we can bring Ruby to the DL mainstream? https://ankane.org/new-ml-gems

> I always see this and I wonder what other people are doing wrong. pip and a requirements.txt has worked for over a decade (along with virtualenv.)

Because python is just a tip of the iceberg. Python in Machine Learning is a glue and API that ties together multiple high performance numerical libraries and frameworks for CPU, GPU. Now you are fighting with your CUDA libraries, pytorch and its toolkits versions, BLAS, MKL, LAPACK, versions of you NVIDIA cards, support for non-standard floating point types (fp16,fp8), you name it. It is a zoo out there, python is just an obvious folly boy.

I can believe that. No Python (or other language) dependency solution will be able to solve that. You need OS level configuration management. The average dev just installs and upgrades things willy-nilly.
Open question here is whether a central repository at OS level solves anything. Logically, modulo reliable networking and fat pipes, the dependency could be coming from anywhere. So centralization doesn't appear to be the issue. The issue is the degree of version explosion for nominal dependency D by applications and figuring out acceptable transitive relations (alternative version) for imprecise matches. (And of course D will have its own set of dependencies.) If you can 'can' that, a systemic way to declare (exact), find (best attempt) match, and use (compatible) dependencies, you can serve up those dependencies from the network or canonical source or a local store cache. And only then you have solved the problem.
Probably not. You need someone actually vetting dependencies, developing setup / onboarding scripts that install actual, approved, verified-working dependencies. OS packages are held, frozen at a specific version. Third-party installs are the same. You don't upgrade random stuff.

Obviously this is a lot less "agile" than most of us are used to.

Once you learn the basics of pip/venv that should mostly work for everything.

Make a new venv for everything and don’t pollute the global environment and it should be fine.

> Make a new venv for everything and don’t pollute the global environment and it should be fine.

This just proves the point that Python sucks at managing dependencies, which exacerbates -- perhaps even encourages -- the reproducibility issues being discussed.

How's that? Similar approaches are used elsewhere. Python just makes the creation of the virtualenv more explicit. npm doesn't pollute the global environment either, by default.