Hacker News new | ask | show | jobs
by ogogmad 1520 days ago
I thought entropy (in the Shannon sense) was a property of discrete and finite probability distributions. It's essentially a measure of how random a sample from such a probability distribution is. Notably, continuous probability distributions don't have meaningful entropy (or in some sense, their entropy is always infinite). It's worth considering the similarities and differences between entropy and standard deviation.

I thought the 2nd law of thermodynamics was saying that with incomplete knowledge, the probability distribution of possible states becomes more and more spread out as time goes on. It's almost a limit to how you can make predictions or simulations of physics when the initial state of the system is not fully known. Equivalently, it's a banal statement about chaos in the sense of chaos theory.

The only thing I don't get is how physicists get around the discrete and finite restriction. Maybe the state of the system is not what has entropy. Rather, one can define an arbitrary function f from the system to a finite set S, and then talk about the entropy of f(System at time t), because this is indeed a discrete and finite probability distribution which you can take the entropy of.

Hmmm. Maybe I understand entropy.

9 comments

> The only thing I don't get is how physicists get around the discrete and finite restriction.

Actually, they don't! When you start doing the math about states in a quantum sense (i.e. statistical mechanics), the basic premise is that the available range of states _is_ discrete. Particles are quantized - so they can only possess certain allowable discrete energy levels. The broader laws of thermodynamics fall out of that and appear to be continuous as you scale up to the macro world across a huge number of microstates.

I think this comment is significantly more insightful than the article.

As for the thing you don't get: quantum mechanics means that the state space is actually discrete, which means there is no need to pass to a continuous distribution. And finiteness is not really a concern either: first of all it is not strictly necessary for the (Gibbs) entropy to be defined, and secondly the space state is actually often finite once e.g. the total energy in the system is fixed.

> I thought entropy (in the Shannon sense) was a property of discrete and finite probability distributions. It's essentially a measure of how random a sample from such a probability distribution is. Notably, continuous probability distributions don't have meaningful entropy (or in some sense, their entropy is always infinite).

True, but for continuous distributions you can use the KL divergence against a uniform distribution :)

One of the properties of entropy H(X) of a random variable X is that if f is a bijective function then H(f(X)) = H(X).

For relative entropy (or "KL divergence" as some people call it), we have that H(X||Y) = H(f(X)||f(Y)). But if you fix Y to have a continuous uniform distribution, then you lose this critical property because f(Y) may no longer have a continuous uniform distribution.

Apparently this "critical property" is not so important to all the people who use relative entropy as a generalization to a continuous distribution defined on a space with an underlying measure.

Why would they care about arbitrary transformations mapping points in the space to other points in the space?

What I think it means, is that if you take two different parametrizations of the same physical phenomenon, then you get two different entropy values.

E.g. if you have a bunch of particles with fixed mass. You could look at the distribution of speeds and get one entropy. Then the distribution of kinetic energy (basically speed squared). Uniform speed means non-uniform speed squared so the entropies would disagree.

This sounds like it could pose issues.

Physical entropy is defined from the probability distribution over states. Velocities or squared-velocities are not states, they are derived quantities. Points in a phase space would describe states. Physical states are discrete anyway when you consider quantum physics :-)

As for the entropy of probability distributions in general, I think relative entropy is invariant under reparametrizations because both the probability of interest and the reference probability transform in the same way [1]. But I don't remember what does it mean exactly. [And I am not sure if that makes ogogmad wrong, I may not have understood well his comment.]

([Edit: forget this aside. You probably were talking about speeds as positive magnitudes.] By the way using an example analogue to yours discrete entropy wouldn't be invariant either: if you have a distribution {-1,1} and square it it collapses to a zero-entropy singleton {1}.)

[1] https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Pr...

+1. The commenter above also wanted cared about bijective mappings, and squaring a random variable in [-1, 1] is not bijective. Squaring a random variable defined over positive real numbers would lead to a bijective mapping and the distribution would still remain uniform.

Actually, I find it hard to come up with a bijective mapping that leads to a non uniform distribution that's useful for anything practical.

Yeah, you also have to transform the "reference" function, and then the entropy stays the same. I prefer to think of it as the "density of states" -- it's necessary to make the argument of the logarithm dimensionless, after all.
> I thought the 2nd law of thermodynamics was saying that with incomplete knowledge, the probability distribution of possible states becomes more and more spread out as time goes on. It's almost a limit to how you can make predictions or simulations of physics when the initial state of the system is not fully known. Equivalently, it's a banal statement about chaos in the sense of chaos theory.

I'm not sure I understand what do you mean by "as time goes on". Classical thermodynamical entropy is defined for a system in equilibrium and it doesn't change with time. It changes when you do things to the system.

I don't think statistical mechanics entropy is limited in this way. I think the (incorrect? oversimplified?) definition given in the article is only valid under the conditions you've given. But I'm not sure.
Then it maybe depends on what you meant by "the 2nd law of thermodynamics".
In Shannon’s 1948 paper, part V deals with continuous sources. The key is to realise that you cannot measure a continuous signal exactly, and so you can define a rate of information relative to the fidelity of your measurement. (I only skimmed that part years ago, and never studied it carefully. But it makes perfect sense.)
If you mean differential entropy (which Shannon supposedly suggested as a generalisation to continuous random variables), this is not a good generalisation of entropy to continuous random variables. It lacks all the interesting properties of entropy.

The "proper" generalisation of entropy to continuous random variables is something called relative entropy, or in some books it's called KL divergence. But this is now a property of how two probability distributions relate to each other, rather than a property of a single probability distribution alone.

I'm not an expert in probability theory or physics, but this is what I've learnt from a brief study of these areas.

Relative entropy? KL? Ah, found it – Kullback–Leibler divergence, it’s called. Thanks, I’ll put that on my list of stuff to learn about.
>how physicists get around the discrete and finite restriction

By turning a sum into an integral. The probability 'density' is p(x), and the 'density of states' is n(x), so then entropy is then integral of p(x)log(p(x)/n(x)) over dx.

Right, it requires a sort of alphabet of discrete specific states. Discrete locations in space, discrete numbers of things and discrete kinds of things.
yeah. i think of minimum entropy as a dirac delta distribution and maximum entropy as a flat uniform random distribution.

i never really understood the physical definition, but always handwaved it away with "things dissipate over time into an undetectable signal, or a flat distribution"

You basically nailed it!