Hacker News new | ask | show | jobs
by programjames 47 days ago
Don't they add a KL loss term to the frozen model's outputs?