|
|
|
|
|
by ademeure
34 days ago
|
|
Apple GPUs didn’t have tensor cores until the M5 (aka “a neural accelerator in each core”) and in the article’s charts that a M5 Pro significantly beats a M4 Max (while in other workloads it would be much smaller since Pro is ~1/2 Max). EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :) NVIDIA tensor cores on consumer GPUs are massively less powerful per SM core than on their datacenter counterparts-parts (which also makes them easier to get to peak efficiency on consumer GPUs because the rest of the pipeline is much more quickly a bottleneck as per Amdahl’s Law). This is potentially changing with Vera Rubin CPX which looks an awful lot like a RTX 5090 replacement but with the full-blown datacenter tensor cores (that won’t be available unless you pay for the datacenter SKU) - so it will have very high TFLOPS relative to its bandwidth. The target market for the CPX is exactly this: prefill and Time To First Token. You can basically just throw compute at the problem for (parts of) prefill performance (but it won’t help anything else past a certain point) and the 5090/M5 are nowhere near that limit. So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets. |
|