|
|
|
|
|
by gambling8nt
6001 days ago
|
|
What Zed is saying when he notes that meta-statistics are normal is that, thanks to the central limit theorem, the average and standard deviation of data sets collected from the same underlying probability distribution (with convergent average and standard deviation) will tend to be normally distributed (in the limit approaching infinite sample size), even if the underlying system behavior is far from a normal distribution. In practice you work with finite sample sizes, so an underlying distribution sufficiently far from normal will result in a non-normal distribution of meta-statistics--but in most applications, these sort of pathological distributions are largely irrelevant. Take our example of looking at response time for loading a web page. There is some finite point (say, 10 sec) beyond which we no longer care how much longer it takes. So instead of considering the distribution of response times t, we consider the distribution of min(t, 10 sec). This distribution only has support over a finite interval, so its meta-statistics normalize rapidly as you increase the number of trials. Using this will under-report the actual standard deviation in the response time (which might, as you say, not even converge), since we've eliminated extremely low probability events with very high response time, but as a practical matter this is largely irrelevant--if these events are high enough probability for us to care we'll notice them anyway. The point of this exercise is not to perfectly ascertain the underlying distribution of t, it is to develop useful predictions for system behavior in practice. |
|