Hacker News new | ask | show | jobs
by equark 5388 days ago
No this is actually a common misunderstanding and gets to the heart of the difference between conditioning on the data vs considering the sampling process. At the 70th flip your best guess is that it is 57%, given a uniform prior. It's perfectly fine to stop based on the results you have, that doesn't change the likelihood of seeing what you saw. Imagine looking each time, clearly your best guess is the sample mean unless you have prior knowledge.

What's confusing is thinking about the sampling distribution. But what might have happened in some other world is of no consequence if you condition on the data rather than the parameter.

This is the likelihood principle. http://en.wikipedia.org/wiki/Likelihood_principle. See the example there and how it relates to sequential trials. It's actually rather deep. Other good links are:

http://books.google.com/books?id=_ravDT9e8nMC&lpg=PA17&#...

http://books.google.com/books?id=oY_x7dE15_AC&lpg=PA27&#...

http://projecteuclid.org/DPubS?service=UI&version=1.0&#3...

1 comments

My only point is that any kind of analysis has to be careful about the way its mathematical assumptions relate to how the real-life experiment is conducted.

I'm not even going near the question of whether the Bayesian approach is "better" than the frequentist approach.

I was trying to point out that the frequentist analysis in the OP does make assumptions about the nature of the experiment (that you will run exactly N trials) and that if you break those assumptions by stopping the test for some N' < N because the answers are looking good, then you'd better understand that your earlier analysis did not apply.

And in another reply, I wanted to add that there is a frequentist answer (the Wald test) to the practical question: Can you widen the scope of the analysis so that I can stop early if I'm getting results that point strongly in one direction?

Being sure that your assumed sampling distribution matches the actual experiment is key, even in the Bayesian case.

My graduate statistics class was taught from Berger, your second link, so I'm broadly sympathetic to the "Bayesian choice" -- but more important, I wanted to give some usable insight to someone who just wants to do an A/B test.

Yes, examining the data will mess up the sampling distribution and invalidate the standard Wald test. But it's absurd in the AB testing context to advocate not acting on your data. Of course it's also absurd to look at conventional p-values if you do. So it's a bit of a Catch-22.

All this confusion goes away if you realize you are interested in p(lift | data) rather than p( data | lift=0). The sampling distribution -- the distribution of the statistic under repeated sampling, p(data | lift=0) -- does not play a role in Bayesian statistics. Obviously the "model" (likelihood/prior) does, but this doesn't include the experimental procedure provided that the experiment is only based on observed data.

AB testing, as a decision procedure, is an area where I don't think the standard frequentist - Bayesian debate applies. The Bayesian decision rule is the only profit maximizing solution. That said, I"m sympathetic to being practical. But all the confusion and conflicting advice related to AB testing stems directly from trying to fit it into a frequentist frame.