with llama-cpp and offloading non-active experts (from MOE architecture) to cpu RAM, you can easily run 50 tok / s QWEN-3.6 35B on 8-12 GB of VRAM.
KV cache is a few GB, experts are ~3-5 GB (assuming q8 quant from Unsloth for example).
You can scroll through r/localllama and find tons of people getting useable speeds out of Qwen 35B.