Hacker News new | ask | show | jobs
by weinzierl 350 days ago
Is your code really fast if you haven't measured it properly? I'd say measuring is hard but a prerequisite for writing fast code, so truly fast code is harder.

The number one mistake I see people make is measuring one time and taking the results at face value. If you do nothing else, measure three times and you will at least have a feeling for the variability of your data. If you want to compare two versions of your code with confidence there is usually no way around proper statistical analysis.

Which brings me to the second mistake. When measuring runtime, taking the mean is not a good idea. Runtime measurements usually skew heavily towards a theoretical minimum which is a hard lower bound. The distribution is heavily lopsided with a long tail. If your objective is to compare two versions of some code, the minimum is a much better measure than the mean.

4 comments

> The distribution is heavily lopsided with a long tail.

You'll see this in any properly active online system. Back in the previous job we had to drill it to teams that mean() was never an acceptable latency measurement. For that reason the telemetry agent we used provided out-of-the-box p50 (median), p90, p95, p99 and max values for every timer measurement window.

The difference between p99 and max was an incredibly useful indicator of poor tail latency cases. After all, every one of those max figures was an occurrence of someone or something experiencing the long wait.

These days, if I had the pleasure of dealing with systems where individual nodes handled thousands of messages per second, I'd add p999 to the mix.

Fast code isn't a quantum effect, it doesn't wait for a measurement to wave collapse into being fast. The _assertion_ that a certain piece of code is fast probably requires a measurement (maybe you can get away with reasoning, e.g. algorithmic complexity or counting instructions; each have their flaws but so does measurement).
For comparing HFT implementations, the 99th percentile is often more practical than minimum values since it accounts for tail latency while excluding extreme outliers caused by GC pauses or OS scheduling.
If you're serious about performance you generally want to use a benchmark library like JMH for Java or BenchmarkDotNet for .Net. At least for those kinds of languages where there's garbage collection and just in time compilation, runtime optimization all this stuff, there's a lot of things to consider and these libraries help you get accurate results.