| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tombert 350 days ago

I remember in 2017, I was trying to benchmark some highly concurrent code in F# using the async monad.

I was using timers, and I was getting insanely different times for the same code, going anywhere from 0ms to 20ms without any obvious changes to the environment or anything.

I was banging my head against it for hours, until I realized that async code is weird. Async code isn’t directly “run”, it’s “scheduled” and the calling thread can yield until we get the result. By trying to do microbenchmarks, I wasn’t really testing “my code”, I was testing the .NET scheduler.

It was my first glimpse into seeing why benchmarking is deceptively hard. I think about it all the time whenever I have to write performance tests.

1 comments

sfn42 350 days ago

Isn't that part of the point? If the code runs in the scheduler then its performance is relevant. Same with garbage collection, if the garbage collector slows your algorithm down then you usually want to know, you can try to avoid allocations and such to improve performance, and measure it using your benchmarks.

Maybe you don't always want to include this, I can see how it might be challenging to isolate just the code itself. It might be possible to swap out the scheduler, synchronization context etc for implementations more suited to that kind of benchmarks?

link

tombert 350 days ago

Yes, but it was initially leading to some incorrect conclusions on my end, about certain things being “slow”.

For example, because I was trying to use fine-grained timers for everything async, I thought the JSON parsing library we were using was a bottleneck, because I saw some numbers like 30ms to parse a simple thing. I wasn’t measuring total throughput, I was measuring individual items for parts of the flow and incorrectly assumed that that applied to everything.

You just have to be a bit more careful than I was with using timers. Either make sure than your timer isn’t going across any kind of yield points, or only use timers in a more “macro” sense (e.g. measure total throughput). Otherwise you risk misleading numbers and bad conclusions.

link

sfn42 350 days ago

I would highly recommend using a specialized library like BenchmarkDotNet. It's more relevant for microbenchmarks but can be used for less micro benchmarks as well.

It will do things like force you to build in Release mode to avoid debug overhead, do warmup cycles and other measures to avoid various pitfalls related to how .NET works - JIT, runtime optimization and stuff like that, and it will output nicely formatted statistics at the end. Rolling your own benchmarks with simple timers and stuff can be very unreliable for many reasons.

link

tombert 350 days ago

Oh no argument on this at all, though I haven't touched .NET in several years, since I no longer have a job doing F# (though if anyone here is hiring for it please contact me!).

Even still I don't know that a benchmarking tool would be helpful in this particular case, at least at a micro level; I think you'd mostly be benchmarking the scheduler more than your actual code. At a more macro scale, however, like benchmarking the processing of 10,000 items it would probably still be useful.

link

bob1029 350 days ago

> Isn't that part of the point? If the code runs in the scheduler then its performance is relevant.

That's the entire point.

Finding out you have tens of milliseconds of slop because of TPL should instantly send you down a warpath to use threads directly, not encourage you to find a way to cheat the benchmarking figures.

Async/await for mostly CPU-bound workloads can be measured in terms of 100-1000x latency overhead. Accepting the harsh reality at face value is the best way to proceed most of the time.

Async/await can work on the producer side of an MPSC queue, but it is pretty awful on the consumer side. There's really no point in yielding every time you finish a batch. Your whole job is to crank through things as fast as possible, usually at the expense of energy efficiency and other factors.

link