| I don't know how we got onto arrival rates of our visitors. I would just like to state that Simpson's paradox is exactly why we shouldn't compare percent conversions. They are meaningless. However, many of the statistical tests like Student's t-test compensate for this paradox by including the number of samples in the tests. See : http://en.wikipedia.org/wiki/Student%27s_t-test#Unequal_samp... I think you said one thing that are at the heart of the issue. We assume that the E[conversion of B] > E[conversion of A] for the entire period sampled. I think all of the details about Poisson processes are not required if you just assume that each person is drawn IID from the population. I just don't think you are answering the right questions here. Let's assume that each person arrives IID from the infinite population. Then, we have a Bernoulli process for each A or B query. A "conversion" results in a 1, a failure results in a 0. Since, these people are arriving IID, we can select a sub-sample which is also IID. We would now like to estimate the parameters for each process and/or compare the two processes. We can do this using t-test. This will give us the statistical significance that the one group had a higher "conversion rate" than the other group. Note: rate does not factor into this problem at all because we assume the participants were IID, so the t-test (used correctly accounting for different number of samples) will tell us which test is larger. My question is now what happens when the parameters of your queries for A or B change over time. Still under the assumption that E[B] > E[A], it now matters greatly in which order you use your samples. I think the only reason you brought the Poisson model into the discussion is to weight the more recent samples higher and down weight the earlier samples in your basket of samples. This is a heuristic for considering a fixed interval in which the samples are stationary. It effectively considers a window that slides with the time of arrival. |
OK, now to what I said about Poisson distributions. Assuming that people arrive on a Poisson distribution allows us to conclude 2 key facts:
1. The statistics will behave exactly like it would if each person arrives IID from an infinite population.
2. Simpson's paradox will not apply to the theoretical distribution of the samples for A and B.
Assuming #1 without #2 does not get you very far. But having facts #1 and #2 allows us to use statistics.
I have no idea why you would speculate that I am attempting to weight recent samples higher and downweight earlier samples. All samples are, in fact, weighted exactly the same. This fact notwithstanding, different times of arrival are not weighted the same. That is because the sample rate fluctuates over time depending on factors such as traffic levels on your webserver. But it fluctuates in an identical way for the two versions. (This fact is critical in being able to conclude point #2.)
Does this help?