Y
Hacker News
new
|
ask
|
show
|
jobs
by
brap
642 days ago
I think it can be a tradeoff to get to smaller models. Use larger models trained on the whole internet to produce output that would train the smaller model.