| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rictic 1578 days ago

Many benchmarking systems face measurement issues that make it difficult to produce solid results. Any given run might not be running on the same hardware, the same OS, built with the same compiler, running with the same runtime, with the same versions of dependencies, with the same system load, at the same temperature, and so on.

One robust solution is to instead do pairwise comparisons, many times in a round robin fashion. The results aren't quite as nice to plot, as you don't get a single consistent speed value, but they are much more reliable and true, and you still get useful information, like ">95% chance that this test is at least 20% faster at this commit than at the previous one".

A project I contribute to uses this strategy: https://github.com/Polymer/tachometer, but I'd love it if more benchmarks took this approach.