| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jphoward 820 days ago
	Is anyone is the AI/ML area finding success with anything other than conda, where installation of CUDA/CUDnn is required? Although I often have to pip install a lot of packages, I find conda's nvidia/pytorch/conda-forge channels are still by far the easiest way to get a deep learning stack up and running, and so I just stick with conda environments. I've tried poetry in the past but getting the NVidia deep learning stack up and running was really tough.

9 comments

rmnclmnt 820 days ago

For anything related to CUDA/CuDNN, use one of NVIDIA base Docker images. Then whether you use Conda / Pip / Poetry / Pipenv does not matter much. Not at all a Conda fan myself and avoid it like the plague

link

Y_Y 820 days ago

What's surprising to me is that this isn't better known. The only reliable solution I've found is to go with the pytorch or deepstream images from NGC. Conda is probably a good idea for noobs who need Cuda installed for them on windows, but otherwise I find it an endless source of finicky issues, especially for unsavvy ML scientists who are looking for a silver bullet for package management.

This link shows which package versions come in which Docker tag and is invaluable: https://docs.nvidia.com/deeplearning/frameworks/support-matr...

link

rmnclmnt 819 days ago

10 years ago, « Data Science » work past the experimental stage was performed by SWE with a knack for applied maths. So investing in tooling to do things properly was a given.

Nowadays, most DS people only want to do ML at the experimental stage only and get lost when things get on the engineering side of things. But for their defense, nowadays the bare minimum skills require to do programming, containerization, CI/CD, etc. More experienced and swiss army knife SWE/MLE have to educate the willing.

It was already the same 10 years ago with MATLAB dudes not wanting to get dirty with C/C++/ASM SIMD. The history repeats itself, only at a faster pace

link

godelski 820 days ago

Yes. I simply do

  python -m pip install torch torchvision

and it works. It used to not, but it's been fine for me for about a year now.

There's a very good chance I've installed cuda on my system before this though. And usually cudnn and some other packages because this is part of my standard install. And then I also never run into the issue where a package is looking for nvcc.

link

misnome 820 days ago

I love poetry but have found it pretty hard once you move off of anything that doesn’t manage to get wheels on pypi.

We make extensive use of conda/mamba to solve this, and are pretty happy with it, especially with conda-forge.

link

claytonjy 819 days ago

I have successfully transitioned an ML/AI team of seasoned researchers away from conda and to poetry. Some also use pyenv, I suspect a lot don't bother but may get bitten eventually.

It's definitely a learning curve, but it turns out every conda user has been bit by the irreproducible tendencies of conda quite often. Nobody uses the conda env file, they just start an env and pip install things into it. They don't realize the base env has stuff, too, and conda envs are hierarchical rather than isolated. I know it's possible to use conda in an isolated and reproducible way, but have yet to meet someone that does so.

So it hasn't been hard to pitch poetry to these folks, and while many complain about the learning curve they appreciate the outcomes.

We're a pytorch shop, and torch mostly just works with pip or poetry these days, as long as you skip the versions the torch maintainers mispackaged. We rarely need anything higher-level that only conda could install.

We really like having more than two dependency groups as this allows us to keep research and production in the same repository. main, dev, research. Then researchers contribute to the core library of a project and keep research and production using the same code for running and evaluating models.

link

michaelmior 820 days ago

I use pipenv and I've found it to be much more usable than conda. For my use cases, it's generally faster and I've run into fewer dependency issues.

link

halflings 820 days ago

uv has been really awesome as a replacement for pip: https://github.com/astral-sh/uv

So fast it finally made virtual environments usable for me. But it's not (yet) a full replacement for conda, e.g. it won't install things outside of Python packages

link

northzen 820 days ago

How about prefix then? https://prefix.dev/blog/uv_in_pixi

link

samus 820 days ago

Pyenv just worked for me. I am actually using Fedora Silverblue and have GCC and the CUDA SDK available only inside a toolbox container. Therefore, I have to enter that toolbox to install things like FlashAttention.

link

abkfenris 819 days ago

Have you tried https://pixi.sh/ ? It brings Cargo/NPM/Poetry like commands and lock files to the Conda ecosystem, and now can manage and lock PyPI dependencies alongside by using uv under the hood.

I haven't been using anything CUDA, but the scientific geospatial stack is often a similar mess to install, and it's been handling it really well.

link

thangngoc89 820 days ago

I use poetry and direnv. Coming from node/npm, it feels natural for me to just do this. I have really no troubles of installing Pytorch with poetry

link

maleldil 819 days ago

How are you installing Pytorch with CUDA with Poetry? I stopped using Poetry because it wouldn't automatically get the CUDA version; instead, it would install the CPU version. I migrated to PDM, which does the right thing.

link

thangngoc89 819 days ago

Before CUDA 12.0 you have to specify a field in pyproject.toml like this

    [tool.poetry.dependencies]
    python = ">=3.10,<3.12.0"
    torch = {version = "^2.0.1+cu118", source = "torch118"}
    torchvision = {version = "^0.15.2+cu118", source =     "torch118"}

    [[tool.poetry.source]]
    name = "torch118"
    url = "https://download.pytorch.org/whl/cu118"
    priority = "explicit"

However, since CUDA 12.0 and Pytorch 2.1.0, just install like normal

    poetry add torch torchvision

link

maleldil 818 days ago

I stand corrected. I was familiar with the first option, which coupled the dependencies with the platform, whereas I wanted a CUDA version on Linux and a Metal version on macOS.

However, this works perfectly with Poetry 1.8 and Pytorch 2.2. I suppose the only problem is what PDM also does, where the lock file is platform-dependent. I'm not sure whether Poetry allows you to select a specific lock file, however.

link

claytonjy 819 days ago

was this before torch 2.0? With the very notable exceptions of a few mispackaged versions, torch now includes all the relevant Nvidia libs, and I haven't seen it grab the CPU version on a GPU box yet, though I'm not sure what it looks for.

A notable open issue in poetry is we can't currently specify one dependency on torch, and have it grab CPU version on some systems and GPU on others. Does PDM solve that?

link

maleldil 818 days ago

I don't think PDM solves that directly. What I do is have different lock files for different platforms (e.g. Linux/CUDA and macOS/Metal), but pyproject.toml lists only "torch".

link