Hacker News new | ask | show | jobs
by neals 1540 days ago
What is a sigma?
6 comments

I work in this field (different experiment); despite the downvotes this is a reasonable question. Reposting my comment from above, since there is confusion here (the other sibling comments are incorrect).

In particle physics, sigma denotes "significance", not standard deviation. Technically what we're quoting as "sigmas" are "z-values", where z=Phi^{-1}(1 - p), where Phi^{-1} is the inverse CDF of the Normal distribution and p is the p-value of the experimental result. So, 7 sigma is defined to be the level of significance (for an arbitrary distribution) corresponding to the same quantile as 7 standard deviations out in a Normal distribution.

This is the correct answer.

In other words, "z sigma" means: That a result like this occurs as a statistical fluke, is just as likely as a standard-normal distributed variable giving a value above z.

I would add: If the null hypothesis is true, then "the result like this... (in this case the null hypothesis is of cause that the standard model is true)
If the null hypothesis were true, and the experiment were repeated infinite number of times with a different sample each time then "the result like this or more extreme ...
I agree with adding the "more extreme" part, but I'm not so sure about the infinite number of times part. Certainly, the p-value is (roughly speaking) the probability of seeing a result at least as extreme as the observed result, under the null hypothesis. But one doesn't really need to introduce hypothetical infinite sequences of replications to make sense of that definition.
Isn't the bit about repeating the study over and over again the whole basis of frequentist statistics, though? (Indeed isn't that why it's called frequentism?)
What's that in Bayesian terms?
The probability of N(1,1) emitting >= 7. (So, one minus the CDF of the normal distribution at 7)
> sigma denotes "significance", not standard deviation.

Nitpick: this is still a standard deviation in some (potentially very contrived and nonlinear) coordinate system. (As a simple example, a log-normal distribution might have a mean of 1 and a standard deviation effectively of multiplying or dividing by 2. Edit: also, multidimensional stuff might have to be shoehorned into a polar coordinate system.) But in practice you'd never bother to construct such a coordinate system, so that's more a mathematical artifact than anything useful.

No, there is no coordinate system. This is referring to the distribution of a test statistic for hypothesis testing. It's a 1-d real scalar, and coordinate transforms don't have any meaningful statistical representation. Of course there are much higher-dimensional distributions, in all sorts of coordinate systems, involved in sampling the test statistic, but at the end of the day this is all you are left with. If you change the underlying distributions of the model, then of course you will change the test statistic distribution, but that's meaningless, since the whole point of the test statistic is to quantify an observation in the context of a given model.

Anyway, as I mentioned elsewhere, the motivation for calling it sigma is that, by construction, it maps onto the quantiles of the standard Normal distribution. So an N-sigma result will have the same p-value as N standard deviations in a Normal distribution. So you can associate "sigmas" with "standard deviations of the Normal distribution". Perhaps this is what you are trying to say, but it does not make sigma a standard deviation in any statistical sense, i.e. it is not necessarily related to the variance of the relevant distribution.

oh wow, thanks for pointing this out :)
For what it's worth, sigma is chosen for this purpose specifically to evoke the notion of "standard deviations". But quoting the std dev. directly is useless, since the distribution is unspecified. So we "convert" the statistical significance to the corresponding number of standard deviations of the Normal distribution, since that is a familiar distribution. If you like, it's another way of stating p-values, which physicists prefer because ours can have lots of zeros :)
A unit used in statistics:

> In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values.[1]

* https://en.wikipedia.org/wiki/Standard_deviation

I thought you were going to link to https://simple.wikipedia.org/wiki/Standard_deviation

The "Simple English Wikipedia" is a really underrated resource for understanding jargon outside your field.

1 sigma = 1 standard deviation
A measurement of uncertainty.
A standard deviation.
It's just a way to say something has probability of 0.0000000002% while looking smart.
It actually is smart to compress this on an exponential scale instead of writing a large number like this :p