Y
Hacker News
new
|
ask
|
show
|
jobs
by
coder543
896 days ago
How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does!
1 comments
teilo
896 days ago
No quantization (8_0). The full 48GB model. As for token count, I haven't tested it on more than 200 or so.
link
pilotneko
896 days ago
Isn’t 8_0 8-bit quantization?
link
teilo
894 days ago
Sorry. That was a major brain fart. Yes. 8-bit quantization, and using 49G of RAM.
link