How would you benchmark something like this? Run multiple processes concurrently and then sort by total run time? Or measure individual process wait time?
I guess both make sense, and a lot of other things (synthetical benchmarks, microbenchmarks, real-world benchmarks, best/average/worst case latency comparison, best/average/worst case throughput comparison...)