Hacker News new | ask | show | jobs
by 317070 29 days ago
It's been used, along with every other divergence and distance you can think of.

In practice, which divergence you use doesn't seem to be very important. The KL is the one with the most theoretic foundation though, i.e. will work with infinite data. The important aspect seems to be that neural networks are Lipschitz bound, and that that is the most important constraint preventing collapse.