Y
Hacker News
new
|
ask
|
show
|
jobs
by
orangepanda
563 days ago
Or maybe even middle class plebeian 24gb rigs?
1 comments
griomnib
563 days ago
At that point just run 8b.
link
pulse7
563 days ago
Or wait for the IQ2_M quantization of 70b which you can run very fast on 24GB VRAM with context size of 4096...
link
griomnib
563 days ago
At some point there’s so much degradation with quantizing I think 8b is going to be better for many tasks.
link