Hacker News new | ask | show | jobs
by banachtarski 2956 days ago
The other reply mentioning "kullback-leibler divergence" (aka KL divergence) is what you need to understand as this is the fundamental concept. Minimizing this quantity is equivalent to minimizing the given "cross-entropy loss" expression. More generally to understand where this comes from, you'll want to read about information theory.