|
|
|
|
|
by bigyabai
4 hours ago
|
|
The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill. For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth. |
|