| I have been sleeping on this for quite a while (long covid is a bitch), but I have built a benchmarking lib that sidesteps quite a few of these problems, by - running the benchmark in thin slices, interspersed and suffled, rather than in one big batch per item (which also avoids having one scenario penalized by transient noise) - displaying a graphs that show possible multi-modal distributions when the JIT gets in the way - varying the lengths of the thin slices between run to work around the poor timer resolution in browsers - assigning the results of the benchmark to a global (or a variable in the parent scope as it is in the WEB demo below) avoid dead code elimination This isn't a panacea, but it is better than the existing solutions AFA I'm aware. There are still issues because, sometimes, even if the task order is shuffled for each slice, the literal source order can influence how/if a bit of code is compiled, resulting in unreliable results. The "thin slice" approach can also dilute the GC runtime between scenarios if the amount of garbage isn't identical between scenarios. I think it is, however, a step in the right direction. - CLI runner for NODE: https://github.com/pygy/bunchmark.js/tree/main/packages/cli - WIP WEB UI: https://flems.io/https://gist.github.com/pygy/3de7a5193989e0... In both case, if you've used JSPerf you should feel right at home in the WEB UI. The CLI UI is meant to replicate the WEB UI as close as possible (see the example file). |