Hacker News new | ask | show | jobs
by NotSammyHagar 550 days ago
I saw your recent post on running Llama 3.3 70B on a m2 pro 64 gb. Do the many variants of apple silicon alternatives with varying numbers of cpus, gpus, and neural engines matter that much for how fast these llms can generate tokens, answer questions? More hw is always better, but what can we say how performance scales with the many different choices?

64gb ram is crucial, after that, need 1+ tb storage, and then?

1 comments

I don't know. I believe memory bandwidth matters, and I got the impression that the M4 series isn't yet as good as the M2 was on that front, but I'm half-remembering things I've heard here.
FYI - M4 has more bandwidth

  Chip       Bandwidth (GB/s)
  ———-       ————————————
  M2         100
  M3         100
  M4         120 (20% more)

  M2-Pro     204
  M3-Pro     153 (less than M2-Pro)
  M4-Pro     273 (78% more than M3-Pro)

  M2-Max     409
  M3-Max     409
  M4-Max     546 (33% more than M2/M3-max)
https://arstechnica.com/apple/2024/10/apples-m4-m4-pro-and-m...