Hacker News new | ask | show | jobs
by mikrl 91 days ago
Great article. Personally I have been learning more about the mathematics of beyond-CLT scenarios (fat tails, infinite variance etc)

The great philosophical question is why CLT applies so universally. The article explains it well as a consequence of the averaging process.

Alternatively, I’ve read that natural processes tend to exhibit Gaussian behaviour because there is a tendency towards equilibrium: forces, homeostasis, central potentials and so on and this equilibrium drives the measurable into the central region.

For processes such as prices in financial markets, with complicated feedback loops and reflexivity (in the Soros sense) the probability mass tends to ends up in the non central region, where the CLT does not apply.

5 comments

The key principle is that you get CLT when a bunch of random factors add. Which happens in lots of places.

In finance, the effects of random factors tend to multiply. So you get a log-normal curve.

As Taleb points out, though, the underlying assumptions behind log-normal break in large market movements. Because in large movements, things that were uncorrelated, become correlated. Resulting in fat tails, where extreme combinations of events (aka "black swans") become far more likely than naively expected.

Some correlations are fine though, there are versions of CLT that applies even when there are benign correlations.

https://en.wikipedia.org/wiki/Central_limit_theorem#Dependen...

I know you know that and were just simplifying. Just wanted this fact to be better known for practitioners. Your comment on multiplicative processes is spot on.

I say more here

https://news.ycombinator.com/item?id=47437152

It's bit of a shame that these other limiting distributions are not as tractable as the Gaussian.

Absolutely. The effect of straightforward correlations is a change in the variance, which can be measured in finance.

The effect of the nonlinear changing correlations is that future global behavior can't be predicted from local observations without a very sophisticated model.

As to ye philosophy of “why” the CLT gives you normals, my hunch is that it’s because there’s some connection between:

a) the CLT requires samples drawn from a distribution with finite mean and variance

and b) the Gaussian is the maximum entropy distribution for a particular mean and variance

I’d be curious about what happens if you starting making assumptions about higher order moments in the distro

The standard framing defines the Gaussian as this special object with a nice PDF, then presents the CLT as a surprising property it happens to have. But convolution of densities is the fundamental operation. If you keep convolving any finite-variance distribution with itself, the shape converges, and we called the limit "normal." The Gaussian is a fixed point of iterated convolution under √n rescaling. It earned its name by being the thing you inevitably get, not by having elegant closed-form properties.

The most interesting assumptions to relax are the independence assumptions. They're way more permissive than the textbook version suggests. You need dependence to decay fast enough, and mixing conditions (α-mixing, strong mixing) give you exactly that: correlations that die off let the CLT go through essentially unchanged. Where it genuinely breaks is long-range dependence -fractionally integrated processes, Hurst parameter above 0.5, where autocorrelations decay hyperbolically instead of exponentially. There the √n normalization is wrong, you get different scaling exponents, and sometimes non-Gaussian limits.

There are also interesting higher order terms. The √n is specifically the rate that zeroes out the higher-order cumulants. Skewness (third cumulant) decays at 1/√n, excess kurtosis at 1/n, and so on up. Edgeworth expansions formalize this as an asymptotic series in powers of 1/√n with cumulant-dependent coefficients. So the Gaussian is the leading term of that expansion, and Edgeworth tells you the rate and structure of convergence to it.

It is the not knowing, the unknown unknowns and known unknowns which result in the max entropy distribution's appearance. When we know more, it is not Gaussian. That is known.
Exactly this. From this perspective, the CLT then can be restated as: "it's interesting that when you add up a sufficiently large number of independent random variables, then even if you have a lot of specific detailed knowledge about each of those variables, in the end all you know about their sum is its mean and variation. But at least you do reliably know that much."
Came here basically looking to see this explanation. Normal dist is [approximately] common when summing lots of things we don't understand, otherwise, it isn't really.
IIRC there's a video by 3b1b that talks about that, and it is important that gaussians are closed under convolution.
That makes it an equilibrium point in function space, but the other half is why it's an a global attractor.
There must be a contractive nature in "passing to the limit". And then Brower's fixed point theorem.

(I know it is very easy to do "maths" this way).

IIRC the third moment defines a maxent distribution under certain conditions and with a fourth moment it becomes undefined? It's been awhile though.

If I'm remembering it correctly it's interesting to think about the ramifications of that for the moments.

You (and others) may enjoy going down the rabbit hole of universality. Terence Tao has a nice survey article on this which might be a good place to start: https://direct.mit.edu/daed/article/141/3/23/27037/E-pluribu...
>natural processes tend to exhibit Gaussian behaviour

to me it results of 2 factors - 1. Gaussian is the max entropy for a distribution with a given variance and 2. variance is the model of energy-limited behavior whereis physical processes are always under some energy limits. Basically it is the 2nd law.

that’s correct but a better explanation is this https://youtu.be/AwEaHCjgeXk?si=tV72uauquCHvzkNE