|
|
|
|
|
by superlopuh
36 days ago
|
|
I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac? To elaborate, the article just says that this step is compute bound, but I'm wondering whether it is just that simple or if it might also be less optimised in MLX? |
|
The RTX 5090 has an incredible amount of compute performance for matrix operations and a lot of memory bandwidth. The Apple Silicon parts have unusually high memory bandwidth for general purpose compute chips, which is why they can generate tokens so fast. Their raw matrix compute performance is amazing for their power envelope but not nearly as fast as a dedicated GPU consuming 400-500W.
Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.
Dedicate GPUs like the RTX 5090 are in another league, though.
You can see the divergence in the high resolution gaming benchmarks, too. Once he starts benchmarking at 4K or 6K where the CPU emulation stops being a bottleneck, the raw compute of the 5090 completely crushes any of the Apple Silicon GPUs.