Y
Hacker News
new
|
ask
|
show
|
jobs
by
irusensei
643 days ago
You are probably swapping. On M3 max with similar memory bandwidth the output is around 4t/s which is normally on par with most people's reading speed. Try different quants.
1 comments
steve_adams_86
643 days ago
I'm on an M2 max so I shouldn't be too far behind. I'm not actually sure how the model I'm using was quantized to be honest. I'll give it a try.
link