|
|
|
|
|
by paulddraper
671 days ago
|
|
To expand, p value tells you significance (more precisely the likelihood of the effect if there were no underlying difference). But if you observe it over and over again and pay attention to one value, you've subverted the measure. Thompson/multi-armed bandit optimizes for outcome over the duration of the test, by progressively altering the treatment %. The test runs longer, but yields better outcomes while doing it. It's objectively a better way to optimize, unless there is time-based overhead to the existence of the A/B test itself. (E.g. maintaining two code paths.) |
|
A key point here is that P-Values optimize for detection of effects if you do everything right, which is not common as you point out.
> Thompson/multi-armed bandit optimizes for outcome over the duration of the test.
Exactly.