| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by usernametaken29 63 days ago
	If you ask me the quickest way to explain KL divergence is like such: If two distributions are the same KL becomes 0. KL quantifies how many nats of difference there is between a target and a source. It’s always good to read through the original information theoretic work. Most of AI is copycats with more compute anyways.

1 comments

chermi 63 days ago

"Nats of difference" carries a lot of the load there. It's not incorrect, but I don't see how it's a superior explanation to op?

link

usernametaken29 63 days ago

I think personally the unit you measure divergence in just doesn’t matter. Yes, nats is technically superior, but as long as you do it consistently, all that you really want to do is to measure how similar A is to B. In that sense I think many explanations of KL are very convoluted.

link