| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by roganartu 1707 days ago

> Keeping the dependency information in a database, which is queried during the resolution process, allows us to choose dependencies using criteria specified by the developer instead of merely importing the latest possible versions, as pip's backtracking algorithm does. You can specify quality criteria depending on the application's traits and environment. For instance, applications deployed to production environments must be secure, so it is important that dependencies do not introduce vulnerabilities. When a data scientist trains a machine learning model in an isolated environment, however, it is acceptable to use dependency versions that are vulnerable but offer a performance gain, thus saving time and resources.

This seems like a really bad idea to me. I could understand and perhaps get behind the idea that you might use something like this to find the optimal version of a package to use in a given project, but unexpected differences between your development environment and production are a common source of outages.

It also requires using a different package manager called Thamos: https://thoth-station.ninja/docs/developers/thamos/. This tool then outputs requirements files compatible with Pipenv, pip, or pip-tools (though notably not Poetry).

That being said, all of the examples and config seems very centered around ML use cases, with the Thamos config accepting settings for OS, cpu, and cuda versions. Is variance in performance between otherwise-compatible versions of ML packages really that big a problem?

2 comments

monkeybutton 1707 days ago

ML Engineer: Why does inference for this model take 0.9s per-call?!

Data scientist: I have no idea, inferences take 0.1s on average in my environment?

I jest but I've also lived this experience with data scientists developing an algorithm on Windows with one set of wheels, and the same code being deployed to Linux with a different set of binaries and the whole thing running 10x slower. We fixed it, but it was an unnecessary headache.

link

joconde 1706 days ago

When PyTorch fails to load the CUDA runtime for any reason, it falls back to CPU, often silently, and becomes more than 20 times slower on CNN inference. Not sure if this system could avoid it. Debugging that remotely on a user’s system was fun.

link