Hacker News new | ask | show | jobs
by enthdegree 3494 days ago
kl divergence has no nice theoretical properties other than 'it is the answer to these questions'

it is also extremely poorly behaved numerically and in convergence

1 comments

I am sorry but I have to call bullshit on this.

To give just a taste for the nice properties of KL, if you are using a layer 1 NN with the sigmoid function as the transform, using square loss gives you an explosion of local minima. OTOH using KL in its place would have given you none. Numerically accuracy is pretty much a non-issue, people have known how to handle KL numerically since the last 40 or so years.

BTW using KL on equivariant Gaussian gives you square loss, apparently the loss you prefer.

if your problem is ok with the asymmetry of KLD