|
|
|
|
|
by norkakn
3721 days ago
|
|
The basic model that Optimizely uses is a Z-Test approximation of a binomial distribution. To run a proper experiment with that model, you should be calculating the sample size ahead of time, and then run it. Each visitor should be independent, and not affected by things like the day of the week, or the time of it. The end result tells you if the distributions are different, but not as much as one would think about the size of the differences. It also can't be 100. The normal distribution has an infinite range, so a finite limit can never capture 100% of it. Optimizely is in a rough spot. People don't like having to think through experimental design, and they are really, really bad at reasoning about p-values. To try to fix the people part, they came out with the sequential stopping rule stuff (their "stats engine), but they never really published much justifying it. The other alternative would be to move the experiments into a Bayesian framework, but that has a lot of it's own problems. When they acquired Synference, that was one of the likely directions to take (along with offering bandits), but that didn't work out and those guys have since left. |
|
Optimizely has a bandit based 'traffic auto-allocation' feature in production on select enterprise plans [1]; bandits are excellent in a wide range of situations, and have many advantages, but like anything, have design parameters and there are some caveats you have to be aware of to make sure you are using them effectively.
On Frequentist and Bayesian: Optimizely's stats engine combines elements of both Frequentist and Bayesian statistics. They have a blog that tries to touch on this issue [2] But this is subtle stuff - and there are a lot of trade-offs, and different perspectives; look at the Bayesian/frequentist debate which has been going on for decades among statisticians.
But, FWIW, I definitely saw Optimizely as an organisation make a big investment to produce a stats engine which had the right trade-offs for how their customers were trying to test; and I think the end result was way more suitable than 'traditional' statistics were.
[1] https://help.optimizely.com/hc/en-us/articles/200040115-Traf... "Traffic Auto-allocation automatically adjusts your traffic allocation over time to maximize the number of conversions for your primary goal. [...] To learn more about how algorithms like this work, you might want to read about a popular statistics problem called the “multi-armed bandit.”"
[2] https://blog.optimizely.com/2015/03/04/bayesian-vs-frequenti... "Yet as we developed a statistical model that would more accurately match how Optimizely’s customers use their experiment results to make decisions (Stats Engine), it became clear that the best solution would need to blend elements of both Frequentist and Bayesian methods to deliver both the reliability of Frequentist statistics and the speed and agility of Bayesian ones."