Y
Hacker News
new
|
ask
|
show
|
jobs
by
umangsh
1197 days ago
30B fp16 takes ~500 ms/token on M2 Max 96GB. Interestingly, that's the same performance as 65B q4 quantized.
65B fp16 is ungodly slow, ~300,000 ms/token on the same machine.