Hacker News new | ask | show | jobs
by irusensei 643 days ago
You are probably swapping. On M3 max with similar memory bandwidth the output is around 4t/s which is normally on par with most people's reading speed. Try different quants.
1 comments

I'm on an M2 max so I shouldn't be too far behind. I'm not actually sure how the model I'm using was quantized to be honest. I'll give it a try.