| HN Mirror

Yes, it's a kind of slippery slope: as typically set up, changes are biased towards degrading quality. If you run a normal significance test and your null is leaving it unchanged, then you will only ever ratchet downwards in quality: you either leave it unchanged, or you have a possibly erroneous choice which trades off quality for degradation, and your errors will accumulate over many tests in a sorites. I discuss this in the footnote about Schlitz - to stop this, you need a built-in bias towards quality, to neutralize the bias of your procedures, or to explicitly consider the long-term costs of mistakes and to also test quality improvements as well. (Then you will be the same on average, only taking tradeoffs as they genuinely pay for themselves, and potentially increasing quality rather than degrading it.)