|
|
|
|
|
by gwern
2671 days ago
|
|
Yes, but it won't help with other problems like measuring the wrong metric. For example, the YouTube latency example linked at the bottom was a randomized A/B test ("launched an opt-in to a fraction of our traffic"), but it was measuring per-user latency metrics when the distribution of 'user' had changed radically thanks to the improvements; for this, he would've needed to instead be monitoring some more global long-term effect like user retention or total traffic (then he would've seen a result like 'latency got a lot worse, but we're getting a ton more users and they're coming back much more frequently, so, that's good overall but why is latency up and who are all these new users...? aha!'). You have a Simpson's paradox on the level of metrics here, instead of individuals. |
|