|
|
|
|
|
by electricshampo1
1806 days ago
|
|
Very often people are looking at icache misses instead of something more precise when regarding perf effects due to code size/layout, etc. That more precise thing is frontend stalls: you only care about misses when they cause stalls; otherwise they are overlapped with actual work being done by the execution units. You can measure frontend stalls on many recent intel chips by IDQ_UOPS_NOT_DELIVERED.CORE https://perfmon-events.intel.com/ Neoverse N1 from Arm has STALL_FRONTEND; see https://developer.arm.com/documentation/PJDOC-466751330-5476... |
|
For frontend you can guess that PGO, BOLT, huge tables might probably help but it's still a blind guess without knowing what to look at next.
Intel's TMA is the only helpful thing here really. Bit sad that AMD and ARM don't provide a way to calculate something TMA-like themselves.