| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bzbarsky 4209 days ago

Here's the problem with SplayLatency.

SplayLatency computes the root-mean-squared allocation time, and your final score is the reciprocal of that, scaled by some constant. Let's call the RMS measurement the "badness"; more badness is worse on this benchmark.

Say you're allocating 1000 objects and object allocation itself takes 0 time so all that's being measured is the GC. You plan to GC them all before your JS runs to completion. You consider two alternate strategies.

One strategy is to perform a GC run every 10 allocations, each of which collects 10 objects. For simplicity, say each GC has 1s of overhead and takes 1s per object collected. So each of your GCs will take 11s. So you will have sqrt((100 * 11^2 + 900 * 0)/1000) = sqrt(12.1) as your "badness" score on the benchmark, and will take 1100s to finish.

Now the second strategy: one GC every 100 allocations. Now each GC takes 101s, and it only takes 1010s to finish. But the splay "badness" score is sqrt((10 * 101^2 + 990 * 0)/1000) = sqrt(102.01).

So per the benchmark the better strategy is the "GC more often" one. But for animations the _second_ strategy is better in this case, because the animations can't run while the JS is running to completion anyway. So as long as both strategies are collecting all the garbage before run-to-completion finishes, the one that's better for animations is the one with higher throughput. But that's the one Splay scores worse.

Back to the real problem we're trying to solve: what hurts animations is a GC strategy that aims for higher throughput by letting garbage pile up across multiple runs to completion and then ends up with a long GC pause at some point. Having a benchmark that penalized that sort of GC strategy would in fact be a good idea. But Splay is not that benchmark. In fact, the optimal GC strategy on Splay is to not GC at all until the benchmark finishes and then do one big GC that takes forever but isn't measured as part of the benchmark time.

Basically, SplayLatency sets up perverse incentives where the simplest ways to do better on the benchmark involve making animation pauses _worse_.

It's also possible to improve the score on SplayLatency by actually improving the throughput of your GC, but that's a lot more work than the other approaches, and just as likely to regress this benchmark if you do it by chunking your GC more within a single run to completion.

The end result is that improvements to this benchmark's score have little to do with reduction of user-visible GC pauses.

You can see some more in-depth discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=958492 but the above basically summarizes what's going on. The fix in that bug ended up just shuffling work around within a single run to completion to placate this benchmark, and the hard part was doing it in a way that didn't regress things too much for actual real-life animation use cases...