| HN Mirror

Perf is not normally distributed. Or rather, it is very very common for it to not be. Where I work, we often see multimodal (usually mostly bimodal) distributions, and we'll get performance alerts that when you look at them are purely a result of more samples happening to shift from one mode to another.

It's easy to construct ways that this could happen. Maybe you're running a benchmark that does a garbage collection or three. It's easy for a GC to be a little earlier or a little later, and sneak in or out of the timed portion of the test.

Warm starts vs cold starts can also do this. If you don't tear everything down and flush before beginning a test, you might have some amount of stuff cached.

The law of large numbers says you can still make it normal by running enough times and adding them up (or running each iteration long enough), but (1) that takes much longer and (2) smushed together data is often less actionable. You kind of want to know about fast paths and slow paths and that you're falling off the fast path more often than intended.

As usual you can probably cover your eyes, stick your fingers in your ears, and proceed as if everything were Gaussian. It'll probably work well enough!