|
|
|
|
|
by sofixa
162 days ago
|
|
You can run vLLM with AMD GPUs supported by ROCm: https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/infer... However from experience with an AMD Strix Halo, a couple of caveats: it's drastically slower than Ollama (tested over a few weeks, always using the official AMD vLLM nightly releases), and not all GPUs were supported for all models (but that has been fixed). |
|
If you want more performance, you could try running llama.cpp directly or use the prebuilt lemonade nightlies.