I may have missed it, but were there any stats about the actual performance gains? It often mentioned binary size etc but nothing about the impact it had.
I, too, found it kind of odd. The bulk of the savings they achieved came from dropping support for the Chromium browser (since Chromium no longer relies on WebKit) and C++ 11. This was a really bizarre article because it had a lot of details about how to reduce the size of a C++ binary but not what that actually means to performance.
What did that 6% binary size reduction buy them in startup time? 10ms? 300ms?
What about memory consumption after startup?
Did tests indicate that the memory locality had improved CPU performance in noticeable manner?
The problem is the performance gains were measured on a patch by patch basis over a period of about a year.
Nobody kept all the numbers and digging them backwards is more work than I have time for.
I am sorry I no longer have the actual numbers. To give an idea of the order of magnitude:
-For startup speed, measuring the cold start of a new WebProcess, the size of the WebCore binary seems to have a direct relation to the time it takes to start the process. Cutting 5% of binary was giving about 5% reduction of startup time.
-The inlining improvements gave a runtime boost of the order of a few (single digit) percents. It was usually improvements over many benchmarks instead of being specific to one part of WebCore.
-Some changes had surprising results. I don't remember specifics but some changes (unrelated to initialization) improved startup time without changing runtime performance in any measurable way.
Exactly. You really should avoid doing performance optimizations without measurements that show improvement as well as provide some coverage against performance regressions.
Here's an example, Thrift (as used in Hector, a Cassandra client), had someone make a performance improvement:
The discussion has a lot of "shoulds", and one measurement of latency distributions, but no measurement of typical workloads or bulk inserts. Turns out, that caused at least a 30% performance regression:
What did that 6% binary size reduction buy them in startup time? 10ms? 300ms?
What about memory consumption after startup?
Did tests indicate that the memory locality had improved CPU performance in noticeable manner?