|
|
|
|
|
by mappu
9 days ago
|
|
I'm running Qwen3.6-35B-A3B on a very ordinary desktop PC (32GB DDR5, 8GB Radeon 6600XT) and getting a useful 15-20 tok/sec out of it. The MoE architecture and auto offloading from system to VRAM is just fantastic. Unsloth Q4_K_XL. The Qwen3.6-27B is unbearably slow as it doesn't fit in VRAM, though, i think the MoE is very easy to run. It is also extremely nice that you can just `apt install llama.cpp libggml0-backend-vulkan` now too. |
|
Yesterday I downloaded Gemma4-26B with Ollama on quite rusty desktop with 1070 8gb and 32gb of ram and Core i5-9400.
I drop photo of my water meter and tell it to read the value and serial number. It was far from instant but it was also easily under 3 minutes and result was correct.
Earlier like in February I was trying the same photo with Gemma3 on the same hardware and results were bad.