|
|
|
|
|
by badsectoracula
421 days ago
|
|
If you don't mind going through the eldritchian horror that is building ROCm from source[0], Qwen_Qwen3-30B-A3B-Q6_K (6bit quantization of the LLM mentioned in the article which in practice shouldn't be much different) works decently fast on a RX 7900 XTX using koboldcpp and llama.cpp. And by "decently fast" i mean "it writes faster i can read". If you're on Debian AFAIK AMD is paying someone to experience the pain in your place, so that is an option if you're building something from scratch, but my openSUSE Tumbleweed installation predates the existence of llama.cpp by a few years and i'm not subjecting myself to the horror that is Python projects (mis)managed by AI developers[1] :-P. EDIT: my mistake, ROCm isn't needed (or actually, supported) by koboldcpp, it uses Vulkan. ROCm is available via a fork. Still, with Vulkan it is fast too. [0] ...and more than once as after some OS upgrade it might break, like mine [1] ok, i did it once, because recently i wanted to try out some tool someone wrote that relied on some AI stuff and i was too stubborn to give up - i had to install Python from source on a Debian docker container because some dependency 2-3 layers deep didn't compile with a newer minor version release of Python. It convinced me to thank yet again to thank Georgi Gerganov for making AI-related tooling that enables people to stick with C++ |
|
llama.cpp can be built using Debian-supplied libraries with ROCm backend enabled.