Thanks for sharing! Despite the fact that Shannon's "A Mathematical Theory of Communication" is so accessible, I find that most in our field (stats/ML) don't often think through information-theoretic tools in a "first principles way."
Yes, KL divergences show up everywhere, but they are not derived from scratch often enough. Maybe I'm stifled by my campus bubble though :)
Yes, KL divergences show up everywhere, but they are not derived from scratch often enough. Maybe I'm stifled by my campus bubble though :)