Hacker News new | ask | show | jobs
by TrackerFF 1438 days ago
First hurdle is to simply get the (more often than not, Python) dependencies to work. I've worked on reproducing some relatively simple DL programs - written by academics I know - where I've literally spent days to weeks just to get all the dependencies right. And I've had direct contact with them - which may absolutely not be the case for other people.

I don't know why DL libraries are so afflicted by this, maybe things just move so fast. But it is such a pain in the ass.

3 comments

IMO it's not DL libraries, it's Python. Python sucks at managing dependencies. It's a hilarious mess of pipenv, prose, conda, vex, pex, shmex and god knows what else is hot now. It seems that every time I want to write a simple Python utility, there is a new way to install and track dependencies.
I always see this and I wonder what other people are doing wrong. pip and a requirements.txt has worked for over a decade (along with virtualenv.) Yes, you may run into issues with larger projects and need something more complex. However, if you just want a list of dependencies and how to install them for a "simple" standalone utility like you describe, pip is the way to go.
It's Python and the people using it - in theory pip + requirements.txt + virtualenv/conda should work, but here's over 1 million issues of people asking for requirements files or fixes to requirements files: https://github.com/search?q=requirements+file&type=issues

PS: For ML/DL enthusiasts, there is a small Ruby DL scene with some nice ported libraries done by the amazing @ankane - super infant right now, but if more folks use it, maybe we can bring Ruby to the DL mainstream? https://ankane.org/new-ml-gems

> I always see this and I wonder what other people are doing wrong. pip and a requirements.txt has worked for over a decade (along with virtualenv.)

Because python is just a tip of the iceberg. Python in Machine Learning is a glue and API that ties together multiple high performance numerical libraries and frameworks for CPU, GPU. Now you are fighting with your CUDA libraries, pytorch and its toolkits versions, BLAS, MKL, LAPACK, versions of you NVIDIA cards, support for non-standard floating point types (fp16,fp8), you name it. It is a zoo out there, python is just an obvious folly boy.

I can believe that. No Python (or other language) dependency solution will be able to solve that. You need OS level configuration management. The average dev just installs and upgrades things willy-nilly.
Open question here is whether a central repository at OS level solves anything. Logically, modulo reliable networking and fat pipes, the dependency could be coming from anywhere. So centralization doesn't appear to be the issue. The issue is the degree of version explosion for nominal dependency D by applications and figuring out acceptable transitive relations (alternative version) for imprecise matches. (And of course D will have its own set of dependencies.) If you can 'can' that, a systemic way to declare (exact), find (best attempt) match, and use (compatible) dependencies, you can serve up those dependencies from the network or canonical source or a local store cache. And only then you have solved the problem.
Probably not. You need someone actually vetting dependencies, developing setup / onboarding scripts that install actual, approved, verified-working dependencies. OS packages are held, frozen at a specific version. Third-party installs are the same. You don't upgrade random stuff.

Obviously this is a lot less "agile" than most of us are used to.

Once you learn the basics of pip/venv that should mostly work for everything.

Make a new venv for everything and don’t pollute the global environment and it should be fine.

> Make a new venv for everything and don’t pollute the global environment and it should be fine.

This just proves the point that Python sucks at managing dependencies, which exacerbates -- perhaps even encourages -- the reproducibility issues being discussed.

How's that? Similar approaches are used elsewhere. Python just makes the creation of the virtualenv more explicit. npm doesn't pollute the global environment either, by default.
> maybe things just move so fast

This is definitely the case in DL (and I'm assuming elsewhere too but I wouldn't know).

I've lost count honestly, running 1-2 year old paper github repos with some detail missing (like the Python version!) that make it non-trivial to run as is. Libraries make undocumented breaking changes, wrong pickle format, authors used a nightly version which didn't make it to a tagged version, and so on.

This perhaps says also something about the CS (versus software eng) background that most people engaging in DL publishing have.

> This perhaps says also something about the CS (versus software eng) background that most people engaging in DL publishing have.

Are those things enjoyable? Or is hacking and playing with ideas enjoyable?

Huge portions of PhD students spent time as software engineers prior to starting their programs. It's not about know-how. It's about not being paid to engineer systems in addition to doing research.

Fewer than 1 in 100 labs have dedicated software engineers, and PhD students are paid $30K/yr. There's no way in hell most of them are going to spend their time doing dependency management or setting up CI/CD pipelines for that salary. If they wanted to spend their time doing software engineering, then can (and would) move to an industry SWE job at 10x the total comp.

You hit the nail on the head. DL library maintainers basically have no respect for backwards compatibility and ensuring everything works. New versions are pushed out on a weekly basis that break existing APIs, and no one really cares because dependency management has been abstracted so far away maintainers don't even understand the repercussions to this 'move fast, break things' mindset (namely, lots of broken software)