| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by av_conk 500 days ago

I tried using ollama because I couldn't get ROCm working on my system with llama-cpp. Ollama bundles the ROCm libraries for you. I got around 50 tokens per second with that setup.

I tried llama-cpp with the Vulkan backend and doubled the amount of tokens per second. I was under the impression ROCm is superior to Vulkan, so I was confused about the result.

In any case, I've stuck with llama-cpp.

1 comments

buyucu 500 days ago

It depends on your GPU. Vulkan is well-supported by essentially all GPUs. AMD support ROCm well for their datacenter GPUs, but support for consumer hardware has not been as good.

link