|
They didn't increase the memory bandwidth. You can get the same memory bandwidth, which is available on the M2 Studio. Yes, yes, of course you can get 512 gigabytes of uRAM for 10 grand. The the question is if a llm will run with usable performance at that scale? The point is there's diminishing returns despite having enough uRAM with the same amount of memory bandwidth even with increased processing speed of the new chip for AI. So there must be a min-max performance ratio between memory bandwidth and the size of the memory pool in relation to the processing power. |
From my napkin math, the M3 Ultra TFLOPs is still relatively low (around 43 FP16 TFLOPs?), but it should be more than enough to handle bs=1 token generation (should be way <10 FLOPs/byte for inference). Now as far is its prefill/prompt processing speed... well, that's another matter.