|
|
|
|
|
by kroaton
76 days ago
|
|
For autocomplete, Qwen 3.5 9B should be enough even at Q4_k_m.
The upcoming coding/math Omnicoder-2 finetune might be useful (should be released in a few days). Either that or just load up Qwen3.5-35B-A3B-Q4_K_S
I'm serving it at about 40-50t/s on a 4070RTX Super 12GB + 64GB of RAM. The weights are 20.7GB + KV Cache (which should be lowered soon with the upcoming addition of TurboQuant). |
|