Hacker News new | ask | show | jobs
by jfarmer 4915 days ago
"I did an A/A test, basically testing the same exact page––expecting the results would be the same."

That's not how A/B testing works. :)

Let's say we want to detect a 1% lift in some metric at 95% confidence and we set up an A/A test. We do the math and it tells us we need to sample 1,000 people to reach 95% confidence on a 1% lift.

If we ran the A/A test 100 times, roughly 5 of them would show a statistically significant difference between the two groups. That's what "95% confidence" means -- it means your false positive rate is 5%. This is called a Type I Error.

http://en.wikipedia.org/wiki/Type_I_and_type_II_errors

You could run a kind of meta-analysis and use the false positive rate as the variable you're measuring to see if there's a statistically significant difference between the %5 false positive rate you expect and the false positive rate the A/B testing software generates in practice.

In this case, your null hypothesis is that the "true alpha" of the A/B testing software is 0.05. You'd sample from among all the 95% confidence tests you run and see whether you can reject the null hypothesis.

1 comments

There are a number of valid ways to run and analyze A/B tests. Bayesian approaches look rather different than what you're describing.
The original commenter was using off-the-shelf A/B testing software, so the odds of it doing anything other than a simple t-test are virtually zero. Not sure that the frequentist vs. Bayesean debate is the most relevant thing for him right now.

I felt it best to leave out nuance that didn't help him understand why his software was showing a statistically significant outcome for an A/A test.

You are assuming the framework isn't broken or misconfigured.
You're not going to out-pedant me, damnit! :P

Given the original comment, yes, I think it's more likely he just didn't understand why a A/A test might sometimes show a false positive.

Even if the system were misconfigured, there's no reason to think it would manifest as a false positive in an A/A test. There are lots of ways it could manifest.