|
|
|
|
|
by roganartu
1659 days ago
|
|
> Keeping the dependency information in a database, which is queried during the resolution process, allows us to choose dependencies using criteria specified by the developer instead of merely importing the latest possible versions, as pip's backtracking algorithm does. You can specify quality criteria depending on the application's traits and environment. For instance, applications deployed to production environments must be secure, so it is important that dependencies do not introduce vulnerabilities. When a data scientist trains a machine learning model in an isolated environment, however, it is acceptable to use dependency versions that are vulnerable but offer a performance gain, thus saving time and resources. This seems like a really bad idea to me. I could understand and perhaps get behind the idea that you might use something like this to find the optimal version of a package to use in a given project, but unexpected differences between your development environment and production are a common source of outages. It also requires using a different package manager called Thamos: https://thoth-station.ninja/docs/developers/thamos/. This tool then outputs requirements files compatible with Pipenv, pip, or pip-tools (though notably not Poetry). That being said, all of the examples and config seems very centered around ML use cases, with the Thamos config accepting settings for OS, cpu, and cuda versions. Is variance in performance between otherwise-compatible versions of ML packages really that big a problem? |
|
Data scientist: I have no idea, inferences take 0.1s on average in my environment?
I jest but I've also lived this experience with data scientists developing an algorithm on Windows with one set of wheels, and the same code being deployed to Linux with a different set of binaries and the whole thing running 10x slower. We fixed it, but it was an unnecessary headache.