|
|
|
|
|
by walrus01
22 days ago
|
|
I personally find any model smaller than something like Qwen 3.6 35B-A3B (8-bit quantization, about 49GB memory usage when loaded into llama.cpp) to be too "stupid" for reliable use. I would much rather not run the model on my local laptop hardware and offload that to some system sitting under my desk in my home office, accessible via VPN, than take the risk of using an unreliable and flaky tool for the convenience of having it on the same hardware on my lap. I pay very little attention to 8 billion or whatever (or even much smaller) models these days and I don't feel like I'm missing much. |
|