| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ugh 5804 days ago
	Wait, so people who do A/B tests didn’t already do that? It drives me absolutely crazy when I don‘t have any measure to assess how likely or unlikely it is for some difference to be random.

3 comments

btilly 5804 days ago

No, people who do A/B tests have known this for years. It is the wannabes who haven't sat down and figured out the statistics who run into trouble. See http://elem.com/~btilly/effective-ab-testing/ for an OSCON tutorial that I did on the topic a couple of years ago, which includes all the gory statistical detail you could want.

Furthermore I note with interest that 2 of the 3 statistical techniques he named (Student's t test and ANOVA) only apply to cases where the observed variables are themselves normally distributed. Which is not a good description of binary yes/no outcomes. As for the remaining test, it is appropriate to use a chi-square, but statisticians tell us that the g-test is preferable.

link

sesqu 5804 days ago

I don't see the problem. The total is very nearly normally distributed by the central limit theorem, is it not?

link

btilly 5803 days ago

The total is indeed nearly normally distributed, but the rate of convergence (particularly in the tails) is not fast enough to avoid having those very sensitive tests give wrong results.

Were it otherwise there would have been no need to develop the chi-square test. It would have been entirely redundant. (It actually is redundant because we have the g-test. But evaluating the chi-square test just involves taking squares, while the g-test involves taking natural logarithms. This made the less accurate chi-square test much easier to do when people didn't have computers to calculate it on. Today we should use the g-test, but few people have heard of it.)

link

sesqu 5803 days ago

Ah, right. I spent a while drawing up a proper plot of the likelihood of the difference and the normal approximation of the difference, and saw that the normal had too small a variance. The effect is still pretty credible in the OP, though.

link

sesqu 5803 days ago

Noprocrast caught out my attempt to edit. The normal variance is too large.

Here's the plot. Black for discretized(n=1000) binomial likelihood, red for normal approximation. The effect is clear, but a t-test won't show it. I'm not familiar with the theory behind the g-test, but there's clearly a lot of room for improvement at these sample sizes.

http://img693.imageshack.us/img693/4880/bindiff.png

link

btilly 5803 days ago

It looks like you forgot to rescale the binomial distribution.

If X_i is a series of independent, identically distributed random variables with mean m and variance v, then X_1 + X_2 + ... + X_n is approximately a normal variable with mean nm and variance vn. Therefore

(X_1 + X_2 + ... + X_n - nm)/sqrt(vn)

is approximately a standard normal.

If you draw that graph, visually the two lines should lie right on top of each other. To see the problem you need to zoom in on the tail and blow it up, and only then will you see the issues with the convergence.

link

notahacker 5804 days ago

I was quite surprised to find that the linked website designed to showcase A/B tests doesn't even hint at things like statistical significance or confidence intervals for the improvements

link

ovi256 5804 days ago

I think most people do not do this because they do not know it is important, or like me, they do not understand the theory behind it. Neither did I know how to do it in practice.

link