| HN Mirror

That's the billion dollar question. These are all research models, the point was to see what happens when you keep training a smaller model.

My best guess (and if I had a concrete answer I'd be out building it) is that, absent a breakthrough, smaller models will be mostly for downstream tasks, like classifiers, that aren't generative. Or fine tuned for specialized generative models that only know one domain. I don't know how well this works for real use cases, but certainly way smaller models generate Shakespeare-like text for example, I don't actually know why you'd do that though.