| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anp 2816 days ago
	Each benchmark result is only compared against values from running on literally the same machine, actually. I agree that good results here would be extremely difficult to produce on virtualized infra, so I rented a few cheap dedicated servers from Hetzner. I'm glad that I decided to pin results to a single machine, because even between these identically binned machines from Hetzner I saw 2-4% variance between them when I ran some phoronix benches to compare. I go into a little bit of detail on this in the talk I link to towards the bottom of the post, here's a direct link for convenience: https://www.youtube.com/watch?v=gSFTbJKScU0.

2 comments

usefulcat 2816 days ago

A suggestion: consider using callgrind to measure performance (instructions retired, cache misses, branch mispredictions, whatever) instead of wall clock time. It will be much slower per run, but since it will also be precise you shouldn't need to do multiple runs, and you should be able to run a bunch of different benchmarks concurrently without them interfering with each other or having anything else interfere with them.

link

anp 2816 days ago

I currently do something pretty similar by using the perf subsystem in the Linux kernel to track the behavior of each benchmark function. In my early measurements I found concurrent benchmarking to introduce unacceptable noise even with this measurement tool and with cgroups/cpusets used to pin the different processes to their own cores. Instead of trying to tune the system to account for this, I chose to build tooling for managing a single runner per small cheap machine.

link

usefulcat 2816 days ago

No such 'noise' is possible with callgrind, as it's basically simulating the hardware. If you're using a VM it seems like you could still get variation between different runs due to other activity on the host system.

link

claudius 2816 days ago

The problem with callgrind is (http://valgrind.org/docs/manual/cg-manual.html#branch-sim):

> Cachegrind simulates branch predictors intended to be typical of mainstream desktop/server processors of around 2004.

In other words, the data produced by Callgrind may be suitable to find obvious regressions, but there still may be more regressions which are only relevant on more modern CPUs.

link

v_lisivka 2816 days ago

Please don't, because memory access pattern will be very different.

link

CUViper 2816 days ago

Some of those target benchmarks are on Rayon, and we've found that valgrind interferes with threading way too much to be useful there.

link

shepmaster 2816 days ago

This is one of the many metrics of the official Rust compiler performance benchmarks [1].

[1]: https://perf.rust-lang.org/nll-dashboard.html

link

MikeHolman 2816 days ago

I haven't used callgrind, but wouldn't running benchmarks concurrently still lead to cache interference?

link

usefulcat 2816 days ago

No, because callgrind is simulating the hardware, including the caches. Which is why it's also much slower.

link

valarauca1 2816 days ago

Thanks for the link. I'll give it watch :D

link