Hacker News new | ask | show | jobs
by twoodfin 826 days ago
Obscure personal conspiracy theory: The CPU vendors, notably Intel, deliberately avoid adding the telemetry that would make it trivial for the OS to report a % spent in memory wait.

Users might realize how many of their cores and cycles are being effectively wasted by limits of the memory / cache hierarchy, and stop thinking of their workloads as “CPU bound”.

3 comments

Arm v8.4 onwards has exactly this (https://docs.kernel.org/arch/arm64/amu.html). It counts the number of (active) cycles where instructions can't be dispatched while waiting for data. There can be a very high percentage of idle cycles. Lots of improvements to be found with faster memory (latency and throughput).
The performance counters for that have been in the chips for a long time. You can argue that perf(1) has unfriendly UX of course.
I think AMD has a tool to check something somewhat related (Cache misses) in AMD uProf
Right, so does Intel in at least their high end chips. But a count of last-level misses is just one factor in the cost formula for memory access.

I appreciate it’s a complicated and subjective measurement: Hyperthreading, superscalar, out-of-order all mean that a core can be operating at some fraction of its peak (and what does that mean, exactly?) due to memory stalls, vs. being completely idle. And reads meet the instruction pipeline in a totally different way than writes do.

But a synthesized approximation that could serve as the memory stall equivalent of -e cycles for perf would be a huge boon to performance analysis & optimization.