Hacker News new | ask | show | jobs
by detrites 1200 days ago
What's the tokens/s on those?
1 comments

With 16 threads, about 140ms per token for 30B, 300ms per token for 65B

I should also mention that 65B should be able to run on 64GB systems. Total system memory consumption on M1 Ultra is about 67GB when running nothing else.