|
|
|
|
|
by adrian_b
589 days ago
|
|
Your initial claim was ambiguous. It sounded like you claimed that using only one core you already reach 1 TFLOP/s, implying that you could reach more than that by using more cores, which is false. Now you have clarified that you actually claim that it is good that when using a single core you can reach the maximum throughput of the shared matrix operation accelerator. This is correct, but there is no essential difference between this and a Zen 5 CPU that reaches this throughput by using only half of the cores, while having the other half of the cores free to do any other tasks. |
|
(Also, that’s a M2 number, since that’s what OP was talking about. Someone will presumably post M4 benchmarks for BLAS sometime soon, if they haven’t already.)