Y
Hacker News
new
|
ask
|
show
|
jobs
by
pulse7
562 days ago
Or wait for the IQ2_M quantization of 70b which you can run very fast on 24GB VRAM with context size of 4096...
1 comments
griomnib
562 days ago
At some point there’s so much degradation with quantizing I think 8b is going to be better for many tasks.
link