Hacker News new | ask | show | jobs
by electricshampo1 1806 days ago
Very often people are looking at icache misses instead of something more precise when regarding perf effects due to code size/layout, etc. That more precise thing is frontend stalls: you only care about misses when they cause stalls; otherwise they are overlapped with actual work being done by the execution units.

You can measure frontend stalls on many recent intel chips by

IDQ_UOPS_NOT_DELIVERED.CORE

https://perfmon-events.intel.com/

Neoverse N1 from Arm has STALL_FRONTEND; see

https://developer.arm.com/documentation/PJDOC-466751330-5476...

1 comments

I agree with you that one can very often get distraced by single events, however knowing that you are frontend/backend bound isn't all that more helpful either.

For frontend you can guess that PGO, BOLT, huge tables might probably help but it's still a blind guess without knowing what to look at next.

Intel's TMA is the only helpful thing here really. Bit sad that AMD and ARM don't provide a way to calculate something TMA-like themselves.