Hacker News new | ask | show | jobs
by pizza 66 days ago
OP is correct; surprisal is outcome-dependent and entropy is distribution-dependent

- entropy is E_p[informativeness of measuring outcome x]

- take n outcomes, then a distribution over them lives on the simplex \delta ^ (n - 1). you can lift this to R^n via the log odds map p_k -> x_k = log p_k -- now x \in R^n can describe a histogram with n-1 degrees of freedom

- in log odds space, measurement is literally a linear functional from vector space of log probability onto the index of the outcome k.

- imo surprisal of some p(x) is best understood as "the length of a pointer", entropy "the rarity-weighted average length of a pointer", and collision entropy "how specific you would have to be to describe witnessing a specific outcome"

and in the same way, a single molecule of water, you might get by, calling dry