|
|
|
|
|
by nyrikki
126 days ago
|
|
It is crazy to me that it is that slow, 4 bit quants don't lose much with Qwen3 coder next and unsloth/Qwen3-Coder-Next-UD-Q4_K_XL gets 32 tps with a 3090 (24gb) as a VM with 256k context size with llama.cpp Same with unsloth/gpt-oss-120b-GGUF:F16 gets 25 tps and gpt-oss20b gets 195 tps!!! The advantage is that you can use the APU for booting, and pass through the GPU to a VM, and have nice safer VMs for agents at the same time while using DDR4 IMHO. |
|