Hacker News new | ask | show | jobs
by jphoward 820 days ago
Is anyone is the AI/ML area finding success with anything other than conda, where installation of CUDA/CUDnn is required? Although I often have to pip install a lot of packages, I find conda's nvidia/pytorch/conda-forge channels are still by far the easiest way to get a deep learning stack up and running, and so I just stick with conda environments. I've tried poetry in the past but getting the NVidia deep learning stack up and running was really tough.
9 comments

For anything related to CUDA/CuDNN, use one of NVIDIA base Docker images. Then whether you use Conda / Pip / Poetry / Pipenv does not matter much. Not at all a Conda fan myself and avoid it like the plague
What's surprising to me is that this isn't better known. The only reliable solution I've found is to go with the pytorch or deepstream images from NGC. Conda is probably a good idea for noobs who need Cuda installed for them on windows, but otherwise I find it an endless source of finicky issues, especially for unsavvy ML scientists who are looking for a silver bullet for package management.

This link shows which package versions come in which Docker tag and is invaluable: https://docs.nvidia.com/deeplearning/frameworks/support-matr...

10 years ago, « Data Science » work past the experimental stage was performed by SWE with a knack for applied maths. So investing in tooling to do things properly was a given.

Nowadays, most DS people only want to do ML at the experimental stage only and get lost when things get on the engineering side of things. But for their defense, nowadays the bare minimum skills require to do programming, containerization, CI/CD, etc. More experienced and swiss army knife SWE/MLE have to educate the willing.

It was already the same 10 years ago with MATLAB dudes not wanting to get dirty with C/C++/ASM SIMD. The history repeats itself, only at a faster pace

Yes. I simply do

  python -m pip install torch torchvision
and it works. It used to not, but it's been fine for me for about a year now.

There's a very good chance I've installed cuda on my system before this though. And usually cudnn and some other packages because this is part of my standard install. And then I also never run into the issue where a package is looking for nvcc.

I love poetry but have found it pretty hard once you move off of anything that doesn’t manage to get wheels on pypi.

We make extensive use of conda/mamba to solve this, and are pretty happy with it, especially with conda-forge.

I have successfully transitioned an ML/AI team of seasoned researchers away from conda and to poetry. Some also use pyenv, I suspect a lot don't bother but may get bitten eventually.

It's definitely a learning curve, but it turns out every conda user has been bit by the irreproducible tendencies of conda quite often. Nobody uses the conda env file, they just start an env and pip install things into it. They don't realize the base env has stuff, too, and conda envs are hierarchical rather than isolated. I know it's possible to use conda in an isolated and reproducible way, but have yet to meet someone that does so.

So it hasn't been hard to pitch poetry to these folks, and while many complain about the learning curve they appreciate the outcomes.

We're a pytorch shop, and torch mostly just works with pip or poetry these days, as long as you skip the versions the torch maintainers mispackaged. We rarely need anything higher-level that only conda could install.

We really like having more than two dependency groups as this allows us to keep research and production in the same repository. main, dev, research. Then researchers contribute to the core library of a project and keep research and production using the same code for running and evaluating models.

I use pipenv and I've found it to be much more usable than conda. For my use cases, it's generally faster and I've run into fewer dependency issues.
uv has been really awesome as a replacement for pip: https://github.com/astral-sh/uv

So fast it finally made virtual environments usable for me. But it's not (yet) a full replacement for conda, e.g. it won't install things outside of Python packages

How about prefix then? https://prefix.dev/blog/uv_in_pixi
Pyenv just worked for me. I am actually using Fedora Silverblue and have GCC and the CUDA SDK available only inside a toolbox container. Therefore, I have to enter that toolbox to install things like FlashAttention.
Have you tried https://pixi.sh/ ? It brings Cargo/NPM/Poetry like commands and lock files to the Conda ecosystem, and now can manage and lock PyPI dependencies alongside by using uv under the hood.

I haven't been using anything CUDA, but the scientific geospatial stack is often a similar mess to install, and it's been handling it really well.

I use poetry and direnv. Coming from node/npm, it feels natural for me to just do this. I have really no troubles of installing Pytorch with poetry
How are you installing Pytorch with CUDA with Poetry? I stopped using Poetry because it wouldn't automatically get the CUDA version; instead, it would install the CPU version. I migrated to PDM, which does the right thing.
Before CUDA 12.0 you have to specify a field in pyproject.toml like this

    [tool.poetry.dependencies]
    python = ">=3.10,<3.12.0"
    torch = {version = "^2.0.1+cu118", source = "torch118"}
    torchvision = {version = "^0.15.2+cu118", source =     "torch118"}

    [[tool.poetry.source]]
    name = "torch118"
    url = "https://download.pytorch.org/whl/cu118"
    priority = "explicit"
However, since CUDA 12.0 and Pytorch 2.1.0, just install like normal

    poetry add torch torchvision
I stand corrected. I was familiar with the first option, which coupled the dependencies with the platform, whereas I wanted a CUDA version on Linux and a Metal version on macOS.

However, this works perfectly with Poetry 1.8 and Pytorch 2.2. I suppose the only problem is what PDM also does, where the lock file is platform-dependent. I'm not sure whether Poetry allows you to select a specific lock file, however.

was this before torch 2.0? With the very notable exceptions of a few mispackaged versions, torch now includes all the relevant Nvidia libs, and I haven't seen it grab the CPU version on a GPU box yet, though I'm not sure what it looks for.

A notable open issue in poetry is we can't currently specify one dependency on torch, and have it grab CPU version on some systems and GPU on others. Does PDM solve that?

I don't think PDM solves that directly. What I do is have different lock files for different platforms (e.g. Linux/CUDA and macOS/Metal), but pyproject.toml lists only "torch".