|
|
|
|
|
by bo1024
2771 days ago
|
|
I'm thinking of it as a statistical hypothesis test. The null hypothesis is that they come from the same distribution. Under that hypothesis, there's only a 0.05 chance of seeing three X tests all below three Y tests. So if we see this, we can probably reject the null. If we think X and Y distributions are both something like normal with similar variance, then we should also be able to say the chance of XXXYYY given Y is better than X is at most 0.05. But if the distributions for X and Y can be really different, then I think you're right -- this test could be misleading! For example, say Y always takes 2 seconds, and X takes 1 second 90% of the time, but 1% of the time it takes an hour. If we run three tests of each, we'll probably only see good runs from X and conclude it's better, when it's not. |
|