| HN Mirror

Perhaps I'm misunderstanding or missing something, but I'm afraid this seems completely wrongheaded to me. (My apologies for being so blunt, but right now your comment appears to be the most-upvoted, and I therefore think it needs some pushback.)

[EDITED to add: I was looking at an old version of the page; by the time I wrote this the parent was no longer the top comment. I'll leave the bluntness in, especially as at least one other person was even blunter.]

You refer to "the mean" and I think you mean the mean of the probabilities. Now, when you've got a probability distribution, by far the usual thing for "the mean" to mean is the sum of Pr(x) x -- the mean of the values. Taking the mean of the probabilities is a really strange thing to do.

One reason why it's a really strange thing to do is that this thing you call n is really kinda meaningless. There's no difference between these two probability distributions: (a) 1, 2, 3, or 4, with probabilities 0.1, 0.2, 0.3, 0.4 respectively; (b) 1, 2, 3, 4, or 5, with probabilities 0.1, 0.2, 0.3, 0.4, 0 respectively. But (a) has n=4 and (b) has n=5. Maybe you want n to be the number of nonzero probabilities? But now consider (a) along with the following probability distribution parameterized by a (small, positive) number h: 1, 2, 3, 4, or 4+h, with probabilities 0.1, 0.2, 0.3, 0.4-h, h. Every version of this distribution with h>0 has n=5, but when h is very small it's practically indistinguishable from (a) with n=4.

Further, since the sum of probabilities is always 1, what you write as sum (p 1/n) is just the same as the number 1/n. You can call it "the mean" if you want to, but I don't see what this adds over calling it what it is: the reciprocal of the number of possibilities.

There is something to what you say: the entropy is kinda related to the number of possibilities; if the probabilities are all equal, the entropy is log(#possibilities); if the probabilities are equal-ish then it's modestly smaller than that. But note e.g. that this relationship is exactly the inverse of what you say, in that "the mean" decreases with the number of possibilities, and the entropy increases with the number of possibilities.

The entropy is not "a measure of the mean". It kinda-sorta is related to "the number of possibilities", which is the reciprocal of "the mean". It is not at all the case, as your last paragraph suggests, that for most purposes we should be using "the mean" but we need to use the entropy when "the number of modes ... is not handled well by simpler metrics", whatever that means; for most purposes we should be using the entropy, and in the special case where all the probabilities are equal we can get away with just counting possibilities.

(In some important situations it turns out that what you have is some number of possibilities with roughly equal probabilities, and a whole lot more whose probabilities rapidly decrease to almost zero, and then you can get away with counting the number of reasonably-probable possibilities and taking its log. E.g., various situations in communications theory can fruitfully be thought of this way. But the entropy is still the more fundamental quantity, and "the mean" is still a needless obfuscation of "the (effectively) number of possibilities".)