Hacker News new | ask | show | jobs
by ActorNightly 17 days ago
The main bus is 300gb/sec, which is on par with MB Pro. MB Max has the 600gb/sec of unified memory (about ~500 or so in practice for token generation) only for the 40 core variant, which is like $7k +, which is ironically more expensive than a dual 3090 card desktop. The 32 core variant which is still wildly expensive is like ~400 gb/sec.

The biggest thing where this will crush Apple is the initial prefill phase. 6000+ cores vs 32/40, + active cooling with fans. For local llm models, this matters quite a bit more than tokens/second.

In the end, neither are really worth it for llm use compared to just building a desktop and just port forwarding over ssh to ollama.

1 comments

Because of the memory costs lately, I doubt this will be much cheaper. Also this is quite a bit slower than even 4070 let alone *90 Nvidia variants albeit with much lower memory.