| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by orangepanda 563 days ago
	Or maybe even middle class plebeian 24gb rigs?

1 comments

At that point just run 8b.

Or wait for the IQ2_M quantization of 70b which you can run very fast on 24GB VRAM with context size of 4096...

At some point there’s so much degradation with quantizing I think 8b is going to be better for many tasks.