Y
Hacker News
new
|
ask
|
show
|
jobs
by
kraddypatties
39 days ago
I believe that’s _part_ of the point (or at least a side-effect) of the KL divergence loss term they have on the AV. That and training stability.