|
|
|
|
|
by usernametaken29
63 days ago
|
|
If you ask me the quickest way to explain KL divergence is like such:
If two distributions are the same KL becomes 0.
KL quantifies how many nats of difference there is between a target and a source.
It’s always good to read through the original information theoretic work. Most of AI is copycats with more compute anyways. |
|