|
|
|
|
|
by sl8r
2903 days ago
|
|
> Why would that be wrong? The issue is that modeling B with a distro centered around 2.5% ignores what we know about the historical conversion rate (2.0%) and the control bucket's conversion rate (also 2.0%). If our goal is to make the best estimate for the future that we can, we should take this data into account when evaluating B. As a thought experiment, imagine that you have A at 2.0% and B at 2.5% conversion for Week 1, with a historical conversion rate of 2.0%. Someone says they'll pay you $100 if you correctly guess what B's conversion rate will be next week, either (i) in the range 2.0% to 2.5%, or (ii) in the range 2.5% to 3.0%. I'd prefer to bet on (i) than on (ii). > What would a Bayesian conclude instead? One simple approach would just be to start with a more informative prior, like Beta(2+1,100-2+1) instead of Beta(1,1). This would pull bucket B's posterior distribution closer to 2.0%. Another approach is to use a hierarchical model [1], which will fit the individual buckets' priors for you. [1] Here's something I wrote on this a couple years ago, more focused on solving multiple comparisons problems but with the same proposed solution: http://normal-extensions.com/2014/07/16/ab-testing-hierarchi... |
|
Both the historical and the control bucket used version A of the website, and they are consistent in their 2.0% conversion rate. Version B is different, and it appears to have a different conversion rate of 2.5%. So why should it not have a future conversion rate close to 2.5%?
Let's replace the website with a 6-sided die. Historically, the probability of throwing a 3 was 1/6. Now you replace your die with a different die and throw it 10,000 times; the 3 comes up 2560 times. If I had to guess how many times the 3 comes up the next 10,000 throws, I certainly would bet that it's closer to 2560 times than to 1667 times.
> Someone says they'll pay you $100 if you correctly guess what B's conversion rate will be next week, either (i) in the range 2.0% to 2.5%, or (ii) in the range 2.5% to 3.0%.
Case A: The historical version A of the online shop had some influence on the conversion rate during the testing of version B, drawing the conversion rate of B down. This influence will fade away in the future, so B's conversion rate will be closer to [2.5%, 3.0%] than to [2.0%, 2.5%].
Case B: The historical version A of the online shop did not have any influence on the conversion rate during the testing of version B (compare the dice example above). Then both ranges are equally plausible. But "[2.0%, 2.5%] vs [2.5%, 3.0%]" is a bad dichotomy. A more relevant one would be "[1.75%, 2.25%] vs [2.25%, 2.75%]". In that case, I would bet on [2.25%, 2.75%].