|
|
|
|
|
by alfiedotwtf
76 days ago
|
|
I haven’t tried any Qwen yet, but so far I’m sticking with gpt-oss-20B. In terms of what I’m using, I’ve looked at anything that will fit on a MacBook Pro with 32Gb RAM (so with shared memory) - LFM2, Llama, Minstral, Ministral, Devstral, Phi, and Nemotron. As for quantisation, I aim for the biggest that will fit while also not being too slow - so it all depends on the model. But I’ll skip a model if I can’t at least use a Q4_K_M. Also, given that I also bump my context to at least 32K, because tooling sucks when the tooling definitions itself come close to 4096! I can’t wait for RAM prices to come down! |
|