| HN Mirror

You're asking a good question -- but you know a lot more than what you write above.

The main thing is, you know that C, D, E, and F came from either A or B. The p-values above don't account for that; they just say what's the chance, due to random fluctuation, that a sample could have come from from the same source as A.

That's reflected in the fact that the pairs of p-values don't add to one! (Like (A,C) and (B,C) in the table above.)

You also implicitly know that at least one of {C,D,E,F} is A-like and one is B-like (otherwise there would not be a problem). So even if you know P(X and Y have same source) for all (X,Y), which you don't, you couldn't multiply them.

Finally, the p-value returned by the KS test will underestimate the true probability of discrepancy. This is because it's only looking at one thing, the max value of a CDF difference. The significant differences between the distributions may lie elsewhere, like in the tails, and the KS test is known to be relatively insensitive to tail behavior. (Although at n=40 you won't be able to see far into the tails.)

There are a host of other tests that use the same idea (empirical CDF difference) but weight differently. Some can be more effective than the KS test if you're looking for certain types of difference. Here's an OK overview, albeit for the goal of assessing normality:

http://www.instatmy.org.my/downloads/e-jurnal%202/3.pdf

In a real problem, it's always a good idea to use more empirical-cdf tests than just the KS test, to compare variances and other moments as some people in the thread have done, and to make histogram or CDF plots -- especially if you're in just 1 dimension and the plots are easy to interpret.