| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by parpfish 91 days ago

As to ye philosophy of “why” the CLT gives you normals, my hunch is that it’s because there’s some connection between:

a) the CLT requires samples drawn from a distribution with finite mean and variance

and b) the Gaussian is the maximum entropy distribution for a particular mean and variance

I’d be curious about what happens if you starting making assumptions about higher order moments in the distro

4 comments

orangemaen 91 days ago

The standard framing defines the Gaussian as this special object with a nice PDF, then presents the CLT as a surprising property it happens to have. But convolution of densities is the fundamental operation. If you keep convolving any finite-variance distribution with itself, the shape converges, and we called the limit "normal." The Gaussian is a fixed point of iterated convolution under √n rescaling. It earned its name by being the thing you inevitably get, not by having elegant closed-form properties.

The most interesting assumptions to relax are the independence assumptions. They're way more permissive than the textbook version suggests. You need dependence to decay fast enough, and mixing conditions (α-mixing, strong mixing) give you exactly that: correlations that die off let the CLT go through essentially unchanged. Where it genuinely breaks is long-range dependence -fractionally integrated processes, Hurst parameter above 0.5, where autocorrelations decay hyperbolically instead of exponentially. There the √n normalization is wrong, you get different scaling exponents, and sometimes non-Gaussian limits.

There are also interesting higher order terms. The √n is specifically the rate that zeroes out the higher-order cumulants. Skewness (third cumulant) decays at 1/√n, excess kurtosis at 1/n, and so on up. Edgeworth expansions formalize this as an asymptotic series in powers of 1/√n with cumulant-dependent coefficients. So the Gaussian is the leading term of that expansion, and Edgeworth tells you the rate and structure of convergence to it.

link

ramblingrain 91 days ago

It is the not knowing, the unknown unknowns and known unknowns which result in the max entropy distribution's appearance. When we know more, it is not Gaussian. That is known.

link

mitthrowaway2 91 days ago

Exactly this. From this perspective, the CLT then can be restated as: "it's interesting that when you add up a sufficiently large number of independent random variables, then even if you have a lot of specific detailed knowledge about each of those variables, in the end all you know about their sum is its mean and variation. But at least you do reliably know that much."

link

D-Machine 91 days ago

Came here basically looking to see this explanation. Normal dist is [approximately] common when summing lots of things we don't understand, otherwise, it isn't really.

link

sobellian 91 days ago

IIRC there's a video by 3b1b that talks about that, and it is important that gaussians are closed under convolution.

link

gowld 91 days ago

That makes it an equilibrium point in function space, but the other half is why it's an a global attractor.

link

pfortuny 91 days ago

There must be a contractive nature in "passing to the limit". And then Brower's fixed point theorem.

(I know it is very easy to do "maths" this way).

link

derbOac 91 days ago

IIRC the third moment defines a maxent distribution under certain conditions and with a fourth moment it becomes undefined? It's been awhile though.

If I'm remembering it correctly it's interesting to think about the ramifications of that for the moments.

link