Hacker News new | ask | show | jobs
by kraddypatties 39 days ago
I believe that’s _part_ of the point (or at least a side-effect) of the KL divergence loss term they have on the AV. That and training stability.