| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by luckyt 2768 days ago
	This is incorrect: there's no reason to expect that X and Y will each appear 3 times in 6 trials if their probabilities are equal. If all 3 measurements of X are smaller than all 3 measurements of Y, then you have X < Y with confidence 1 - 1/8 or 87.5% confidence. You'd need at least 5 measurements to be 95% confident.

2 comments

bo1024 2768 days ago

You're not considering the right probability space. We have 3 measurements of X and 3 of Y. The question is the distribution on orderings of these six measurements. If X and Y come from the same distribution then all orderings are equally likely.

link

luckyt 2768 days ago

Ah, I failed to consider that. The original post is correct.

link

gerdesj 2768 days ago

I'm also having trouble with this. On the face of it the quick and dirty "XXXYYY" test outlined above looks good but are these two following statements consistent? ie are the run times of X (new code) and Y (old code) really from the same distribution.

"is my new code faster than my old code"

"If X and Y come from the same distribution then all orderings are equally likely"

link

bo1024 2768 days ago

I'm thinking of it as a statistical hypothesis test. The null hypothesis is that they come from the same distribution. Under that hypothesis, there's only a 0.05 chance of seeing three X tests all below three Y tests. So if we see this, we can probably reject the null.

If we think X and Y distributions are both something like normal with similar variance, then we should also be able to say the chance of XXXYYY given Y is better than X is at most 0.05.

But if the distributions for X and Y can be really different, then I think you're right -- this test could be misleading! For example, say Y always takes 2 seconds, and X takes 1 second 90% of the time, but 1% of the time it takes an hour. If we run three tests of each, we'll probably only see good runs from X and conclude it's better, when it's not.

link

saagarjha 2768 days ago

Well, you’re the one measuring each three times. So this should always be the case?

link