Hacker News new | ask | show | jobs
by adrian_b 589 days ago
Your initial claim was ambiguous.

It sounded like you claimed that using only one core you already reach 1 TFLOP/s, implying that you could reach more than that by using more cores, which is false.

Now you have clarified that you actually claim that it is good that when using a single core you can reach the maximum throughput of the shared matrix operation accelerator.

This is correct, but there is no essential difference between this and a Zen 5 CPU that reaches this throughput by using only half of the cores, while having the other half of the cores free to do any other tasks.

1 comments

What’s the power draw of however many zen 5 cores you have to tie up to hit, say, 1.5tflop/s on sgemm?

(Also, that’s a M2 number, since that’s what OP was talking about. Someone will presumably post M4 benchmarks for BLAS sometime soon, if they haven’t already.)