Hacker News new | ask | show | jobs
by a_wild_dandan 509 days ago
Running a 680-billion parameter frontier model on a few Macs (at 13 tok/s!) is nuts. That'a two years after ChatGPT was released. That rate of progress just blows my mind.
1 comments

And those are M2 Ultras. M4 Ultra is about to drop in the next few weeks/months, and I'm guessing it might have higher RAM configs, so you can probably run the same 680b on two of those beasts.

The higher performing chips, with one less interconnect, is going to give you significantly higher t/s.