|
|
|
|
|
by 4b6442477b1280b
324 days ago
|
|
with quantization, 20B fits effortlessly in 24GB with quantization + CPU offloading, non-thinking models run kind of fine (at about 2-5 tokens per second) even with 8 GB of VRAM sure, it would be great if we could have models in all sizes imaginable (7/13/24/32/70/100+/1000+), but 20B and 120B are great. |
|