Hacker News new | ask | show | jobs
by npteljes 349 days ago
What I experienced is that AI is a nightmare on AMD in Linux. There is a myriad of custom things that one needs to do, and even that just breaks after a while. Happened so much on my current setup (6600 XT) that I don't bother with local AI anymore, because the time investment is just not worth it.

It's not that I can't live like this, I still have the same card, but if I were looking to do anything AI locally with a new card, for sure it wouldn't be an AMD one.

3 comments

I don't have much experience with ROCm for large trainings, but NVIDIA is still shit with driver+cuda version+other things. The only simplification is due to ubuntu and other distros that already do the heavy lift by installing all required components, without much configuration.
Oh I'm sure. The thing is that with AMD I have the same luxury, and the wretched thing still doesn't work, or has regressions.
On Ubuntu, in my experience, installing the .deb version of the CUDA toolkit pretty much "just works".
I set up a deep learning station probably 5-10 years ago and ran into the exact same issue. After a week of pulling out my hair, I just bought an Nvidia card.
Are you referring to AI training, prediction/inference, or both? Could you give some examples for what had to be done and why? Thanks in advance.
Sure! I'm referring to setting up a1111's stable diffusion webui, and setting up Open WebUI.

Wrt/ a1, it worked at one point (a year ago) after 2-3 hours of tinkering, then regressed to not working at all, not even from fresh installs on new, different Linuxes. I tried the main branch and the AMD specific fork as well.

Wrt/ Open WebUI, it works, but the thing uses my CPU.