Hacker News new | ask | show | jobs
by google2342 2503 days ago
You shouldn't use the mean when doing benchmarking. Better to use the median or fastest time. Lots of random things can happen on computers (usually in the OS) that can result in some operation taking 1000x longer.
2 comments

Interesting point. The %timeit functionality actually used to output the mean of the x fastest runs, they seem to have changed that at some point. The docs still explain the old behavior [1].

I assume that they feel your concerns are addressed because they display the stddev together with the mean, so you can see if there were any extreme outliers.

[1] https://ipython.org/ipython-doc/dev/interactive/magics.html#...

If you're testing a single-threaded benchmark, then the test statistics aren't going to be meaningfully different in interpretation, especially if you're only asking the question "is A or B faster?" What's more important is that you capture enough runs to characterize the distribution well; if you have that, you'll get meaningful results no matter which statistic you're actually measuring.