Hacker News new | ask | show | jobs
by banachtarski 2956 days ago
"Cross-entropy is defined as the difference between the following two probability distributions"

Huh? No this is a mathematically imprecise statement (and not correct either). Most explanations use references to information theory, where a perfect knowledge of the desired probability distribution leads to a perfect allocation of bits in a binary encoding. The entropy is the expected number of bits when this allocation is done using the incorrect distribution, and obviously the goal is to minimize this, hence why it is suitable for use as a loss function.

1 comments

> The entropy is the expected number of bits when this allocation is done using the incorrect distribution

Is there any source that would derive and/or explain this more in-depth? I've been trying to develop an intuition for this, but haven't come across a good explanation.

The other reply mentioning "kullback-leibler divergence" (aka KL divergence) is what you need to understand as this is the fundamental concept. Minimizing this quantity is equivalent to minimizing the given "cross-entropy loss" expression. More generally to understand where this comes from, you'll want to read about information theory.
the OG http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...

also lookup kullback-leibler divergence