| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by drostie 4040 days ago

So the basic math here looks like this: in classical probability theory, when you've got multiple random variables `X, Y` they have a joint probability `f(x, y)` such that the probability that (X, Y) is in the square (x ± dx/2, y ± dy/2) is `f(x, y) dx dy`. Independent distributions have probabilities which factor: `f(x, y) = g(x) * h(y)`, to fit the general rule that independent discrete events have probabilities which multiply. To calculate probabilities for a function like `Z = X + Y` you have to invent a new probability-distribution function `p(z) = ∫ f(x, z - x) dx`, which for independent events appears as a convolution of the two probability-densities.

All of the useful statistics for a probability distribution can be gleaned from the probability-density function by taking the Fourier transform and then (when it's not too bad) the (complex) natural logarithm -- or, when it's easier, just `f(s) = log(E[exp(s X)])` (having `s X` rather than `i s X` up-top). These are called cumulant-generating functions or CGFs.

If you have a sum of two independent random variables, the CGF of the sum is the sum of their CGFs. This means that the Taylor polynomial of the CGFs have components which simply add together. Take derivatives of the CGF to get the "cumulants", which are pseudo-linear (linear in independent random variables, with some relationship `f(k X) = k^n f(X)`). So these cumulants become really convenient for characterizing the distribution as a whole: in fact if you have all of them you can recover the original distribution.

Now if you describe your distributions in terms of its cumulants, you now have a special monoid:

    data Distr n = Distr [n]
    instance (Num n) => Monoid (Distr n) where
        mempty = Distr (repeat 0)
        mappend = zipWith (+)

Gaussian distributions in particular have a form for their CGF which is preserved by this monoid operation, namely:

    if X ~ Gauss(m, s^2) then ln(E(exp(k X))) = m k + s^2 k^2

(Note that the leading term has to be 0, as if s = 0 we have ln(E(1)) = ln(1) = 0.) Another way to state this is "the Fourier transform of a Gaussian is another Gaussian."

So that is super-simple, and you can already see the hints of the central limit theorem emerging: the mean of N identically-distributed variables X_i will have a CGF related to the coefficients c_i of the original distribution by:

    ln(E(k * sum_i X_i / N)) = N sum_m c_m k^m / N^m

so the Taylor expansion gets attenuated by successive powers of N: you can approximate the CGF with truncation.

So, the set of all distributions form a monoid under convolution if you allow the Dirac delta-function to act as `mempty` -- in a certain representation, this appears as the termwise-summing monoid. Gaussian distributions are a sub-monoid of this larger monoid, and they are not the only one: we could easily go out to 3, 4, or 5 cumulants before setting the rest to 0 and finding a similar sub-family.