|
|
|
|
|
by lostdog
1993 days ago
|
|
The high variance is another problem. Good software has low variance in performance. Especially if you're sampling in the tens of thousands. The high variance does give you two tactical problems. First, how do you keep performance from getting worse? Typically you would set a threshold on the metrics, and prevent checking in code that breaks the threshold. With high variance you clearly cannot do this. Instead, make the barrier soft. If the performance tests break the threshold, then you need to get signoff from a manager or senior engineer. This way, you can continue to make coding progress while adding just enough friction that people are careful about making performance worse. The second problem of high variance is showing that you're making progress. However, for you, this isn't a real problem. You're not talking about cutting 500 microseconds off a 16 millisecond frame render. You need to cut 5-25 second page loads down by a factor of 10 at least. There must be dozens of dead obvious problems taking up seconds of run time. Is Confluence's performance so atrocious that you couldn't statistically measure cutting the page load time in half? |
|
Showing that we're making progress isn't as much of a problem - similar to what you stated, the fixes themselves target large enough value that it's measurable at volume for sure, and even in testing.
The main issue is "degradations" -> catching any check-ins that can degrade performance. These are usually small individually (lets say, low double digit MS) within the variance noise), but add up over time, and by the time the degradation is really measurable, its complicated tracking down the root cause. Hopefully I described that in a way that makes sense?
Any suggestions welcome.
(Edit: downvoted too much and replies are throttled again) ----@lostdog Thanks for the detail! will definitely take this to eng team for process discussion.