|
|
|
|
|
by bildung
60 days ago
|
|
I currently run the qwen3.5-122B (Q4) on a Strix Halo (Bosgame M5) and am pretty happy with it. Obviously much slower than hosted models. I get ~ 20t/s with empty context and am down to about 14t/s with 100k of context filled. No tuning at all, just apt install rocm and rebuilding llama.cpp every week or so. |
|