You can just show the posterior and let your brain be the decision rule. You can visually see the difference in conversion rate and the uncertainty around it. That info makes it easy to decide whether to continue the test or stop the test and pick the best performer. Much better information to base a decision on than a hypothesis test with a significance threshold that people pull out of their ass.
If you want to be fancy you could even implement a strategy that maximizes the total conversions based on bayesian decision theory, so that it automatically tends to show the best performer as time goes on.
That article is weird. It uses a normal distribution as the prior for the conversion rate. That could produce a negative conversion rate or a conversion rate above 100%. Then in the section "So why doesn’t everyone already do this?" they say "The answer is simple - it’s computationally inefficient.". No shit if you are using a normal prior. A much better way to do this is to use a beta prior (or a Dirichlet prior in case you have more than 2 alternatives). Then the math becomes trivial & fast and you don't have nonsense negative or above 100% conversion rates.
I didn't say hypothesis test, I said decision rule. The method I describe in the article has only two quantities "pulled out of the ass" - the threshold of caring and the prior. If you visually inspect the posterior, your are implicitly pulling out of your ass an unknown "threshold of visual similarity".
That article is weird. It uses a normal distribution as the prior for the conversion rate.
That's incorrect. From the article: "To begin we will choose a Beta distribution prior." The computational intensiveness is not caused by the choice of prior, it's caused by the need to evaluate an integral over the joint posterior.
A Dirichlet prior is also not what you'd use for more than 2 alternatives - you have two beta distributions, one representing the posterior for the control and the other for the variation. If you had a second variation, you'd have 3 beta distributions, and you'd need to evaluate a 3 dimensional integral.
> I didn't say hypothesis test, I said decision rule.
I did not say that you said hypothesis test.
> If you visually inspect the posterior, your are implicitly pulling out of your ass an unknown "threshold of visual similarity".
Yes, but you are "implicitly pulling a number out of your ass" based on a lot more information. When you ask somebody to come up with a mechanical decision rule before seeing the posterior, it's unlikely that you will get as good a decision as when you just show them the posterior.
> That's incorrect. From the article: "To begin we will choose a Beta distribution prior." The computational intensiveness is not caused by the choice of prior, it's caused by the need to evaluate an integral over the joint posterior.
Ah, I was confused because they are specifying the prior in terms of a mean and standard deviation. That is a very weird way to represent a beta distribution.
> The computational intensiveness is not caused by the choice of prior, it's caused by the need to evaluate an integral over the joint posterior.
I see, they are computing expected_value(max(ctr[A]-ctr[B], 0.0)). That is still weird though. What you want to know is if it's worth it to run the test another time. So you want to compare E(final conversion rate if stop now) with E(final conversion rate if run another time), and if the latter is not much greater than the former you stop the test. Both of those have a closed form. Even better would be to compare E(final conversion rate if stop now) and E(final conversion rate if we test A) and E(final conversion rate if we test B). Then you would also automatically decide the best version to show (e.g. if the uncertainty about A is small and the uncertainty about B is big, you'll show B).
> A Dirichlet prior is also not what you'd use for more than 2 alternatives
Hm? Lets say you have a free plan, basic plan, and enterprise plan. This is a very common scenario in practice. A dirichlet prior would be the natural thing to use here, IMO.
E(final conversion rate if run another time)... Both of those have a closed form.
I'm curious - where can I learn more?
Lets say you have a free plan, basic plan, and enterprise plan...A dirichlet prior would be the natural thing to use here, IMO.
This would be handled via Dirichlet, and then the results multiplied by their LTV. I thought you were referring to multiple variants - i.e., landing page A, landing page B, landing page C.
Suppose you have Beta(a1,b2) and Beta(a2,b2) at the current step. The expected conversion rates are:
M(a,b) = a/(a+b)
E1 = M(a1,b1)
E2 = M(a2,b2)
If we stop now the expected conversion rate is E = max(E1,E2).
If we continue for another timestep with option 1 then the question is whether that can make us switch from 1 to 2 or from 2 to 1 or not. If it can't then the expected conversion rate is the same whether or not we execute one more step. Lets assume without loss of generality that option 2 is currently winning, but if option 1 gets another conversion then 1 is winning. So the new expected conversion rate is:
E' = int(p_1(r)*(r*r + (1-r)*E2)), r=0..1)
where p_1 is the probability density of Beta(a1,b1). All the moments of the beta distribution have a closed form, so E' also has a closed form.
You could generalize this to running it for n more times instead of one more time, you'd get an expression of the form:
E = int(p_1(r1)*p_2(r2)*polynomial(r1,r2))
I suspect that also has a closed form but I'm not sure at first glance.
If you want to be fancy you could even implement a strategy that maximizes the total conversions based on bayesian decision theory, so that it automatically tends to show the best performer as time goes on.
That article is weird. It uses a normal distribution as the prior for the conversion rate. That could produce a negative conversion rate or a conversion rate above 100%. Then in the section "So why doesn’t everyone already do this?" they say "The answer is simple - it’s computationally inefficient.". No shit if you are using a normal prior. A much better way to do this is to use a beta prior (or a Dirichlet prior in case you have more than 2 alternatives). Then the math becomes trivial & fast and you don't have nonsense negative or above 100% conversion rates.