| HN Mirror

It’s possible, I guess, but it wouldn’t be my first thought. It’s too slow for that.

A payload of 1.6KB at 45us/B is 75ms, which is below the typical scheduling quantum of about 100ms. (Can’t say anything about Erlang, let alone Erlang bindings to C libraries, but I wouldn’t expect it to be that much smaller either, precisely because of the switching overhead both direct and indirect.) So a single compression operation shouldn’t be getting preempted enough to affect the results.

Typical RAM bandwidth is tens of GB/s (even consumer-class SSDs[1] are single-digit GB/s) so with tens to hundreds of cores that’s not enough to affect anything, and even taking into account the compressor’s window is measured in megabytes not kilobytes that’s likely not enough (it would be a bad compressor that reread its whole window each time, anyway). And the data we’re compressing is not only minuscule, it has just been generated and is virtually guaranteed to be cached.

Honestly, I almost want to say that the benchmark is measuring the wrong thing somehow, except they’re reporting a 2× speedup switching from one compressor to another. So it can’t be the JSON encoding overhead or whatnot, and, unless one of the Erlang bindings is somehow drastically stupider than the other, it shouldn’t be the FFI overhead, and even those are a huge stretch. The Flying Spaghetti Monster be merciful, I cannot see anything here that we could be spending over a hundred million cycles on.

At this point I’m hoping somebody just mixed up the units, because this is really unsettling.

[1] https://lemire.me/en/talk/perfsummit2020/