Hacker News new | ask | show | jobs
by brap 642 days ago
I think it can be a tradeoff to get to smaller models. Use larger models trained on the whole internet to produce output that would train the smaller model.