|
|
|
|
|
by kanemcgrath
56 days ago
|
|
I have been using Qwen3.5-35B-A3B a lot in local testing, and it is by far the most capable model that could fit on my machine.
I think quantization technology has really upped its game around these models,
and there were two quants that blew me away Mudler APEX-I-Quality.
then later I tried
Byteshape Q3_K_S-3.40bpw Both made claims that seemed too good to be true, but I couldn't find any traces of lobotomization doing long agent coding loops.
with the byteshape quant I am up to 40+ t/s which is a speed that makes agents much more pleasant.
On an rtx 3060 12GB and 32GB of system ram, I went from slamming all my available memory to having like 14GB to spare. |
|
unsloth and byteshape are just using and highlighting features that have been available the whole time. I am very invested in figuring out a solution to this dispute, or some way to get the new quants upstreamed.