|
|
|
|
|
by bluemanshoe
5367 days ago
|
|
I would use the Kolmogorov-Smirnov test: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test It's in scipy.stats.ks_2samp Results: Sets| D |p-value ----------------- A,C |0.275|0.080| B,C |0.175|0.531| ----------------- A,D |0.125|0.893| B,D |0.275|0.080| ----------------- A,E |0.100|0.983| B,E |0.300|0.043| ----------------- A,F |0.300|0.043| B,F |0.100|0.983| ----------------- As far as the test goes, if D is small and p is high, you cannot reject the hypothesis that the two datasets came from the same distribution. The p-value is roughly how often, randomly you would get similar looking data assuming the null hypothesis (in this case that they are drawn from the same dataset) In light of this evidence, if they are not lying to us, and really each of these sets came from and A-like or B-like distribution, I'd say fairly confidently that: F is B-like E is A-like D is A-like and C is B-like (though with lower confidence) The box-plot: http://i.imgur.com/epPw7.png
seems to confirm. |
|
Actually, just looking at the variance of the columns tells the same story as you've discovered above:
> var(data) A B C D E F 117.19610 20.49239 33.13114 90.62195 115.39044 27.34298
A, D, E are in the same group, B, C, F are in the same group.