Hacker News new | ask | show | jobs
by canyon289 722 days ago
Hi, I work on the Gemma team (same as Alek opinions are my own).

Essentially instead of tokens that are "already there" in text, the distillation allows us to simulate training data from a larger model