Hacker News new | ask | show | jobs
by shepmaster 997 days ago
A solution is mentioned in the article, but perhaps obliquely:

> while I also wanted to measure hardware counters

As I understand it, hardware counters would remain consistent in the face of the normal noisy CI runner.

The article talks about using Cachegrind (via the iai crate) and Linux perf events.

I use iai in one of my projects to run performance diffs for each commit.

2 comments

> As I understand it, hardware counters would remain consistent in the face of the normal noisy CI runner.

With cloud CI runners you'd still have issues with hardware differences, e.g. different CPUs counting slightly differently. even memcpy behavior is hardware-dependent! And if you're measuring multi-threaded programs then concurrent algorithms may be sensitive to timing. Also microcode updates for the latest CPU vulnerabilities. And that's just instruction counts. Other metrics such as cycle counts, cache misses or wall-time are far more sensitive.

To make sure we're not slowly accumulating <1% regressions hidden in the noise and to be able to attribute regressions to a specific commit we need really low noise levels.

So for reliable, comparable benchmarks dedicated is needed.

> With cloud CI runners you'd still have issues with hardware differences

For my project it really is the diff of each commit, which means that I start from a parent commit that isn’t part of the PR and re-measure that, then for each new commit. This should avoid accounting for changes in hardware as well as things like Rust versions (if those aren’t locked in via rustup).

The rest of your points are valid of course, but this was a good compromise for my OSS project where I don’t wish to spend extra money.

The thing is that things like Cachegrind are supposed to be used as complements to time-based profilers, not to replace them.

If you're getting +-20% different for each time based benchmark, it might just be noisy neighbors but could also be some other problem that actually manifests for users too.

> used as complements to time-based profilers, not to replace them

Sure. I also use hyperfine to run a bigger test as a user would see the system. I cross reference that with the instruction counts. I use these hardware metrics in a free CI runner, and hyperfine locally.