| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jostmey 3495 days ago
	Why not KL-Divergence, which measures the error between a target distribution and the current distribution? From the perspective of Information Theory, it is the best error measurement. Oh, and let's not forget that for a lot of problems minimizing the KL-divergence is the exact same operation as maximizing the likelihood function.

1 comments

enthdegree 3494 days ago

kl divergence has no nice theoretical properties other than 'it is the answer to these questions'

it is also extremely poorly behaved numerically and in convergence

srean 3494 days ago

I am sorry but I have to call bullshit on this.

To give just a taste for the nice properties of KL, if you are using a layer 1 NN with the sigmoid function as the transform, using square loss gives you an explosion of local minima. OTOH using KL in its place would have given you none. Numerically accuracy is pretty much a non-issue, people have known how to handle KL numerically since the last 40 or so years.

BTW using KL on equivariant Gaussian gives you square loss, apparently the loss you prefer.

bwwvbiwbw 3493 days ago

if your problem is ok with the asymmetry of KLD