Hacker News new | ask | show | jobs
by umangsh 1197 days ago
30B fp16 takes ~500 ms/token on M2 Max 96GB. Interestingly, that's the same performance as 65B q4 quantized.

65B fp16 is ungodly slow, ~300,000 ms/token on the same machine.