Hacker News new | ask | show | jobs
by nuz 701 days ago
Lots and lots of synthetic data from the bigger models training the smaller ones would be my guess.