Hacker News new | ask | show | jobs
by danbruc 1567 days ago
In general, we consider data with high entropy to be less informative, and data with less entropy to be more informative.

This is exactly backwards, large entropy means large information content. If the circles are all red or all blue, then you need only one bit to distinguish between the two possibilities. If half of the circles are red and the other half blue, then you need one bit per circle to describe the circles.

2 comments

This is just the classic problem of mixing technical terms (like information) and using dictionary definitions. Personally, I believe they intend to say the same thing as you. What they are trying to say is that the lower the entropy, the more similar to a Dirac function. In their mind, this means you know exactly what the distribution is and hence "informative". But, as you point out, that just means you already know everything which is the exact opposite of information. In the context of Wordle, guessing a word with 0 entropy would be a wasted guess as you would have all the previous words remaining. That is, guessing a word that has already been guessed. How informative!
Thanks for this. I was following along just fine until that last sentence, which caused me to think "nope, I guess I don't follow at all".