Hacker News new | ask | show | jobs
by leourbina 1032 days ago
One intuition is that KL-divergence represents a sort of “distance” between probability distributions. However, this isn’t quite right as it doesn’t satisfy some basic properties a real distance (a norm) would satisfy, including the fact that it isn’t symmetric: KL(Q, P) != KL(P,Q), and it does not satisfy the triangle inequality. Nonetheless, KL(P,Q) gives you a good idea of how “far” is P is from Q: in the context of encoding, if you wanted to come up with an ideal encoding of symbols coming from P, but you guessed Q as the distribution of these symbols, then KL(P, Q) is the extra number of bits you’d have to use. One nice property is that in the case that KL(P,Q) = 0, P and Q are equal (almost everywhere, which for most applications is irrelevant). This makes it useful in the ML context as you can minimize KL divergence and know that the resulting “guessed” distribution is getting closer to the data distribution you’re trying to guess using some parametrized function (an NN).
1 comments

> it doesn’t satisfy some basic properties a real distance (a norm) would satisfy, including the fact that it isn’t symmetric [...] and it does not satisfy the triangle inequality.

Not sure about "real" but one can have useful distances which are not symmetric like the distance between cities measured in time or in gallons.

It just needs to be clarified that KL divergence isn’t a proper mathematical norm, so it doesn’t behave the way we intuitively think a distance should. As mentioned, it doesn’t satisfy the triangle inequality, which is a basic property for any distance-like function.

In comparison, both of your examples are much closer to norms as they both satisfy the triangle inequality.

For reference, this is what I’m referring to when I say a “norm”:

https://en.m.wikipedia.org/wiki/Norm_(mathematics)

I was just clarifying that norm and distance are not the same as you seemed to imply with "a real distance (a norm)". (And I think one can also intuitively understand that the distance from A to B may not be the same as the distance from B to A as soon as we step outside of geometry.)