Hacker News new | ask | show | jobs
by krausest 3633 days ago
You're basically right. Thus we've decided to perform 10 runs for each benchmark and drop the worst 4. This eliminates cases where the gc or a background system process causes a slow run. In consequence this also strongly reduces variance, but is blind on the gc eye. It would be interesting to include gc, but I have no idea how to do it in a real good way.

The results are comparable (but nowhere equal) between runs and the difference between the frameworks is in most cases large enough that the ranking stays consistent (but you wouldn't prefer preact to cycle.js v7 for having a 0.01 better result in the average slow down, would you? For ember, mithril and cycle.js v6 performance might be something to consider depending on your use case.).

If you look at the console output this is just an approximation, the real measurement is performed on data from chrome's timeline using selenium. The console measurement uses a timer to guess when rendering was performed. Of course this is more inaccurate than using the timeline.

3 comments

I think an equally interesting result would be to keep only the worst 1-4 benchmark results and compare those. In my experience some frameworks put much more long-term memory pressure on the GC, and by discarding those, your benchmark turns a blind eye on that.

In another words, I'd be interested more on the worst possible performance of a given framework, than their best (assuming the app is written in a reasonably good way otherwise).

Also, +1 to the sibling post that asks for longer runs, that would solve the GC question by including it altogether!

Do the benchmarks take a long time? Why not as many runs as necessary to give you a tight confidence interval?
Running all the benchmarks takes a few hours.

I'll keep that in mind for the next round. Let's see if that works nicely. Would you still drop outliers?

Thanks for the explanation! That sounds like as good a methodology as any to me.