| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IainIreland 827 days ago

(I work on SpiderMonkey.)

Benchmarking is hard. It is very easy to write a benchmark where improving your score does not improve real-world performance, and over time even a good benchmark will become less useful as the important improvements are all made. This V8 blog post about Octane is a good description of some of the issues: https://v8.dev/blog/retiring-octane

Speedometer 3, in my experience, is the least bad browser benchmark. It hits code that we know from independent evidence is important for real-world performance. We've been targeting our performance work at Speedometer 3 for the last year, and we've seen good results. My favourite example: a few years ago, we decided that initial pageload performance was our performance priority for the year, and we spent some time trying to optimize for that. Speedometer 3 is not primarily a pageload benchmark. Nevertheless, our pageload telemetry improved more from targeting Speedometer 3 than it did when we were deliberately targeting pageload. (See the pretty graphs here: https://hacks.mozilla.org/2023/10/down-and-to-the-right-fire...) This is the advantage of having a good benchmark; it speeds up the iterative cycle of identifying a potential issue, writing a patch, and evaluating the results.

1 comments

lapcat 827 days ago

This doesn't say anything about what the scores mean.

21 is apparently better than 20, but how much better? You could say "1 better", tautologically, but how does that relate to the real world?

Driving a car 1 mile per hour faster may be better, in a sense, but even if you drove 24 hours straight, it would only gain you 24 total miles, which is almost negligible on such a long trip. Nobody would be impressed by that difference.

link

charcircuit 827 days ago

It means it is 5% faster. You are overcomplicating it.

link

lapcat 826 days ago

Percentages are rarely informative without an absolute reference.

A 5% raise for someone who makes $20k per year is $1k, whereas a 5% raise for someone who makes $200k is $10k, which would be a 50% raise for the former.

link

zamadatix 825 days ago

You've demonstrated you understand how to use the score to compare both inter-browser performance (analogous to the amount each makes per year) as well as individual browser performance improvements (analogous to the amount of the raise). Seems pretty informative to me?

link

Vinnl 826 days ago

Iain explained that in a reply to your other comment: https://news.ycombinator.com/item?id=39672279

> "The score is a rescaled version of inverse time" is the key here.

> If you run all the tests in half the time, your Speedometer score will double. If your score improves by 1%, it implies that you are 1% faster on the subtests.

> (There are probably some subtleties here because we're using the geometric mean to avoid putting too much weight on any individual subtest, but the rough intuition should still hold.)

link

lapcat 826 days ago

https://news.ycombinator.com/item?id=39673609

link

itishappy 826 days ago

That's irrelevant. The speedometer reading is an absolute reference. The percentages being discussed are simply comparisons, and they're only being discussed to say "they behave like you'd expect."

To directly answer your original question: a reading of 21 is 5% better than a reading of 20 because 21 is 5% greater than 20, and this means that a 21 speed browser should do things 5% faster than a 20 speed browser.

TL;DR: They behave like you'd expect.

link

lapcat 826 days ago

> The speedometer reading is an absolute reference.

To what?

I talked about driving a car. Miles and hours are an absolute reference. We still have no absolute reference for Speedometer.

link

itishappy 826 days ago

To... itself? Go measure something. You now have a reference!

If you scratch out the labels of your car speedometer and forget which is which, it still measures speed. 80 is still 33% faster than 60, regardless of the units.

link

bigfudge 827 days ago

I guess that’s why it’s fairly interesting to see scores thrown out in this thread on random hardware. It’s anexdata, but gives a sense of the spread/variance of scores for common platforms. I don’t think this is a number that is ever going to make much sense for consumers to use because without this sort of context it’s just going to be like the spinal tap ‘this one goes to 11’ sort of problem.

link