| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by teilo 891 days ago
	I'm running it on an M2 Max with 96GB, and have plenty of room to spare. And it's fast. Faster than I can get responses from ChatGPT.

1 comments

How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does!

No quantization (8_0). The full 48GB model. As for token count, I haven't tested it on more than 200 or so.

Isn’t 8_0 8-bit quantization?

Sorry. That was a major brain fart. Yes. 8-bit quantization, and using 49G of RAM.