|
|
|
|
|
by pizza
66 days ago
|
|
OP is correct; surprisal is outcome-dependent and entropy is distribution-dependent - entropy is E_p[informativeness of measuring outcome x] - take n outcomes, then a distribution over them lives on the simplex \delta ^ (n - 1). you can lift this to R^n via the log odds map p_k -> x_k = log p_k -- now x \in R^n can describe a histogram with n-1 degrees of freedom - in log odds space, measurement is literally a linear functional from vector space of log probability onto the index of the outcome k. - imo surprisal of some p(x) is best understood as "the length of a pointer", entropy "the rarity-weighted average length of a pointer", and collision entropy "how specific you would have to be to describe witnessing a specific outcome" and in the same way, a single molecule of water, you might get by, calling dry |
|