| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Exuma 1081 days ago
	Care to explain cross entropy simply? That’s where I paused currently

1 comments

mellavora 1080 days ago

Shooting from the hip:

entropy of a single signal, say a sequence of letters, "ababababab" is the scaled "average" surprise per letter. So if they are uniformly distributed, each letter is equally likely/unlikely to come next in the sequence, where if instead one letter only 1/1000th of the time (aaa....aaa...aa..a.z.aaaa), then when the rare beast shows up, it is a big surprise, so the total amount of surprise available in the sequence is high.

That's entropy.

The same thing would be true for a sequence of numbers.

But what if there is some relationship? if aaabaa occurs frequently with 111211, if you line up the sequences by timestamp?

In this simple case, if you know the letters and you can spot the relationship, then there is zero surprise in the number sequence. The cross entropy "letters plus numbers" has the same entropy as "letters" or "numbers" in isolation.

And as you move away from the 1:1 correspondence, you'll see the cross entropy increase until it reaches its max at "entropy(letters) + entropy(numbers)" -- no information shared between the two systems.

To bring it home, I think of cross entropy as the amount of information shared between two signals.

Others might think of it slightly differently.

link

tczMUFlmoNk 1080 days ago

Mostly yes, but to your second paragraph:

> if instead one letter only 1/1000th of the time (aaa....aaa...aa..a.z.aaaa), then when the rare beast shows up, it is a big surprise, so the total amount of surprise available in the sequence is high

…when a Bernoulli distribution is skewed, the maximum surprise is high, yes, but the average surprise (= entropy) is low. The entropy of a Bernoulli distribution is maximized when p = 0.5 and falls off to either end:

https://en.wikipedia.org/wiki/Binary_entropy_function

For your examples, if the sequence is uniformly distributed (Bernoulli(1/2)), the entropy is log(2) ≈ 0.693 bits per symbol; if instead one letter occurs 1/1000th of the time, the entropy is about 0.0079 bits per symbol.

link

Exuma 1080 days ago

Awesome, thank you very much. Great answer

link