|
|
|
|
|
by wdrw
1030 days ago
|
|
For me, the intuitive way of understanding it is, "how badly would a gambler lose in the long term, if they keep betting on a game believing the probability distribution is X but it is in actual fact Y". It also explains why KL divergence is assymetric, and why it goes to infinity / undefined when the expected probability distribution has zeros where the true distribution has non-zeros. Suppose an urn can have red, blue and green balls. If the true distribution (X) is that there are no red balls at all, but the gambler believes (Y) that there is a small fraction of red balls, the gambler would lose a bit of money with every bet on red, but overall the loss is finite. But suppose the gambler beleives (Y) there are absolutely no red balls in the urn, but in actual fact (X) there is some small fraction of them. According to the gambler's beliefs it would be rational to gamble potentially infinite money on the ball not being red, so the loss is potentially infinite. There is a parallel here to data compression, transmission, etc (KL divergence between expected and actual distributions in information theory) - if you believe a certain bit sequence will never occur in the input sequence, you won't assign it a code, and so if it ever does actually occur you won't be able to transmit it at all ("infinite loss"). If you beleive it will occur very infrequently, you will assign it a very long code, and so if it actually occurs very frequently your output data will be very long (large loss, large KL divergence). |
|