Hacker News new | ask | show | jobs
by richard___ 40 days ago
Why is self-distillation necessary? Why can't they get the ground-truth for "skipping" steps?