|
|
|
|
|
by wtallis
534 days ago
|
|
I believe the throughput shown in those tables is the total throughput for the whole CPU core, so it isn't immediately obvious which instructions have high throughput due to pipelining within an execution unit and which have high throughput due just to the core having several execution units capable of handling that instruction. |
|
For example, for many years Intel chips had a multiplier unit on a single port, with a latency of 3 cycles, but an inverse throughput of 1 cycle, so effectively pipelined across 3 stages.
In any case, I think uops.info [1] has replaced Agner for up-to-date and detailed information on instruction execution.
---
[1] https://uops.info/table.html