Hacker News new | ask | show | jobs
by londons_explore 1534 days ago
Does it make sense to even discuss the sigma of any deviation?

When you add in the "10% chance that some scientist messed up the maths or something in the experiment", then it's impossible to ever reach 7 sigma...

7 comments

Yes. The meaning of 7 sigma is: It's very very unlikely that this is a statistical fluke, it must have a different reason (new physics, systematic error, ...)
Known unknowns, and unknown unknowns. Still useful to quantify the known unknowns and compare significance of various events according to them.
When you look at the graph at the bottom, several independent measurements have non-overlapping error bars, and are even on opposite sides of the Standard Model prediction. So, yeah, somewhere along the line there've been bad measurement errors...
Since error bars are typically +-1 sigma, you expect about 1/3 of all measurements to be further away from the true value than the error bar, if all error estimates are correct, and uncorrelated. That's actually a check a lot of doctored data fails.
This is why these measures have to be taken with a grain of salt (but are still useful).

Probability is subjective, in this case because it's dependant on the design of the experiment / quality of the analysis of that experiment to determine a p-value of a given result.

The book "Bayesian analysis in high energy physics" is a short and sweet introduction. If I got the title wrong I'll update it later.

Then it would never make sense, because someone messing up somewhere is always a possibility.

I would assume that the implication is that its 7 sigma assuming the measurements were done correctly.

Yeah, my thought from reading the headline was, "That's a funny way of saying we were completely wrong."
If a quantity cannot be negative (such as a mass), then standard deviation isn't the best choice.

EDIT: Yes, because the Gaussian distribution extends to +/- infinity; davrosthedalek explains it best, below.

A fair dice roll can only have positive values {1,2,3,4,5,6} but it has a clearly defined std deviation: sqrt(105/36) -- there's no clear reason this isn't the 'best choice' that's just a case of application.
The point about applications is mostly valid even if theoretically unsatisfying, but I think the thing about dice rolls is basically spurious.
You can calculate the mean μ and the standard deviation σ of a dice roll. You get μ=3.5, σ=sqrt(105/36)~=1.707... . It's not very similar to a Gaussian, but sometimes these numbers are useful anyway.

It's more interesting if you calculate the distribution of the sum of rolling 100 dices. It's easy to calculate, becuase μ=100*3.5=35, σ=sqrt(100*105/36)~=17.07... But now the distribution is very similar to a Gaussian with μ=100*3.5=35 and σ=sqrt(100*105/36)~=17.07... https://en.wikipedia.org/wiki/Central_limit_theorem They are not equal because the sum of the roll of 100 dices is bounded between 100 and 600 and the Gaussian is not bounded. For most applications, you can just use the Gaussian instead of the exact distribution.

The predicted value is so incredibly far from zero that you can pretend it's a truncated Gaussian and not see any actual difference in the results.

Alternate reply: Gaussian approximation to the binomial is perfectly valid in all sorts of cases.

What would be a better choice?
GP is probably referring to the coefficient of variation, sigma/mu (standard deviation divided by mean), which normalises out for example the unit of measurement.

However, the 7 here is basically (x - mu)/sigma, so it is normalised (in that sense), anyway.

No, I think the problem (in principle) is that "standard deviation" has a special meaning for Gaussian distributions, which extend to infinity in both directions. A quantity that has a fixed range has most likely an asymmetric distribution, so one would expect an asymmetric error bar as well. But for a sigma<<the value, it's often not a big concern.

A good example is efficiency measurements. I can't count how often I have seen students say something like: Our detector is 99%+-3% efficient. Obviously a detector can't be 102% efficient.

> "standard deviation" has a special meaning for Gaussian distributions,

I have a master's degree in statistics and this is the first I'm hearing about it.

> Our detector is 99%+-3% efficient. Obviously a detector can't be 102% efficient.

In the absence of any other context I'd guess that they're using an approximation to a confidence interval that might be perfectly fine if the estimated value was nearer the center of the allowable range.

Well, special in two senses: First, in the canonical formula for Gaussians, sigma appears directly. For the case at hand, the confidence limits associated with 1 sigma, 2 sigma etc. in physics match exactly the area under the curve for a Gaussian integrated +- said sigma around the mean. That's were that connection actually comes from, and a physicist will always think: Within 1 sigma? That's 67%.

Hearing 99+-3% is a very strong indication that the person used an incorrect way to determine the uncertainty, most likely by taking the square-root of counts. But you are right, if the efficiency would be around 50%, that approximation is not so bad.

What's wrong with saying "Our detector is 99%+-3% efficient," if they are giving the output of some procedure that constructs valid confidence intervals? The confidence intervals will trap the true value 95% of time (or whatever the confidence level is). If it does what it promises to do, I don't see the problem.
Because a 99+3=102 is not a valid upper interval bound. You cannot have >100% efficiency for a detector. Also, your expected value cannot be centered. So maybe 99+1-3 is a valid range (but I would be very suspicious if the bound includes 100%)