Hacker News new | ask | show | jobs
by benenglish 1203 days ago
Wondering how difficult this would be to get running on a m1 max?
2 comments

I got one token every 8 minutes or so.
Using which model ? On a pretty mid range i5 11th gen I'm getting 0.35 token/s, using the 7B model. Haven't tried the bigger models.
Is that good? Not good?
A token is approximately 4 characters. So, four characters per 8 minutes is pretty slow. This comment would take 1224 minutes to generate, if I was an AI.
Usually you want tokens per second, not seconds per token. So it's a bad sign.
another commenter posted a fork that does it https://news.ycombinator.com/item?id=35067469

per the readme it looks like there a few bugs to figure out in case anyone here is a pytorch expert