Hacker News new | ask | show | jobs
by tziki 2956 days ago
> The entropy is the expected number of bits when this allocation is done using the incorrect distribution

Is there any source that would derive and/or explain this more in-depth? I've been trying to develop an intuition for this, but haven't come across a good explanation.

3 comments

The other reply mentioning "kullback-leibler divergence" (aka KL divergence) is what you need to understand as this is the fundamental concept. Minimizing this quantity is equivalent to minimizing the given "cross-entropy loss" expression. More generally to understand where this comes from, you'll want to read about information theory.
the OG http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...

also lookup kullback-leibler divergence