Hacker News new | ask | show | jobs
by usernametaken29 63 days ago
If you ask me the quickest way to explain KL divergence is like such: If two distributions are the same KL becomes 0. KL quantifies how many nats of difference there is between a target and a source. It’s always good to read through the original information theoretic work. Most of AI is copycats with more compute anyways.
1 comments

"Nats of difference" carries a lot of the load there. It's not incorrect, but I don't see how it's a superior explanation to op?
I think personally the unit you measure divergence in just doesn’t matter. Yes, nats is technically superior, but as long as you do it consistently, all that you really want to do is to measure how similar A is to B. In that sense I think many explanations of KL are very convoluted.