| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by npmipg 317 days ago

Hey, I'm the author of the post.

The image has been fixed, and the point I'm making is that proprietary models are almost always ahead, and this gap is widening. OS models that are nearly at the same quality are usually distilled versions of proprietary models, or somehow get training data from them. Sometimes, after massive, expensive training runs models are open sourced anyway, and at some point that becomes unsustainable.

The difference between a top model and a model with a similar ELO might seem small, but the value of even a marginal increase in intelligence is extremely high--for example I only use the best coding model for coding, whatever the cost.

There's also lots of evidence that large labs are only getting started. In the past year, they have secured massive amounts of compute, which is still not utilized well. I expect lots of big training runs in the future, which will shift the gap further between OS and proprietary models.

The major problem for these companies is they spend hundreds of millions of dollars training a model, and then someone comes in the next day and distills something almost as good for far less money (still a VERY large sum of money.)

I don't know how this will be resolved long term.

2 comments

npmipg 317 days ago

Note that distilling a general model is several orders of magnitude more expensive than distilling a task-specific model, which is what I'm trying to promote here. Smart general models make distilling great task specific models with no expert labelers way easier.

link

joshcartme 315 days ago

Thanks for clarifying!

I think I'm getting it now: OS models are getting closer, but only via distillation. Not by training a new frontier model which is out of reach for economic reasons.

link