Hacker News new | ask | show | jobs
by jostmey 3495 days ago
Why not KL-Divergence, which measures the error between a target distribution and the current distribution? From the perspective of Information Theory, it is the best error measurement.

Oh, and let's not forget that for a lot of problems minimizing the KL-divergence is the exact same operation as maximizing the likelihood function.

1 comments

kl divergence has no nice theoretical properties other than 'it is the answer to these questions'

it is also extremely poorly behaved numerically and in convergence

I am sorry but I have to call bullshit on this.

To give just a taste for the nice properties of KL, if you are using a layer 1 NN with the sigmoid function as the transform, using square loss gives you an explosion of local minima. OTOH using KL in its place would have given you none. Numerically accuracy is pretty much a non-issue, people have known how to handle KL numerically since the last 40 or so years.

BTW using KL on equivariant Gaussian gives you square loss, apparently the loss you prefer.

if your problem is ok with the asymmetry of KLD