|
|
|
|
|
by janwas
751 days ago
|
|
On x86 at least, the cost of OoO is astonishing - more pJ per instruction dispatch than the operation itself. Amortizing that over more operations is the whole point of SIMD. I have not yet seen such data for Arm. That aside, see the "cmp" sibling thread for a major (4x penalty) downside to 4x128. |
|
Could you point me to the "cmp" thread you mentioned? I don't know where to look for it.