Hacker News new | ask | show | jobs
by joshcartme 317 days ago
Maybe I'm totally misreading this, but it seems like the post contradicts itself. At the beginning of the third paragraph:

> Impressively, open source models have been able to quickly catch up to big labs.

And then the beginning of the fourth:

> Open-source has been lagging behind proprietary models for years, but lately this gap has been widening.

Followed by a picture that is more or less inscrutable.

2 comments

Hey, I'm the author of the post.

The image has been fixed, and the point I'm making is that proprietary models are almost always ahead, and this gap is widening. OS models that are nearly at the same quality are usually distilled versions of proprietary models, or somehow get training data from them. Sometimes, after massive, expensive training runs models are open sourced anyway, and at some point that becomes unsustainable.

The difference between a top model and a model with a similar ELO might seem small, but the value of even a marginal increase in intelligence is extremely high--for example I only use the best coding model for coding, whatever the cost.

There's also lots of evidence that large labs are only getting started. In the past year, they have secured massive amounts of compute, which is still not utilized well. I expect lots of big training runs in the future, which will shift the gap further between OS and proprietary models.

The major problem for these companies is they spend hundreds of millions of dollars training a model, and then someone comes in the next day and distills something almost as good for far less money (still a VERY large sum of money.)

I don't know how this will be resolved long term.

Note that distilling a general model is several orders of magnitude more expensive than distilling a task-specific model, which is what I'm trying to promote here. Smart general models make distilling great task specific models with no expert labelers way easier.
Thanks for clarifying!

I think I'm getting it now: OS models are getting closer, but only via distillation. Not by training a new frontier model which is out of reach for economic reasons.

> Followed by a picture that is more or less inscrutable.

Yeah. Just to make it explicit - that chart has Deepseek r1 at ... presumably an elo of 1418 and Gemini Pro at 1463. That is comparable to the gap between Magnus Carlsen and Fabiano Caruana [0]. I don't think it is reasonable to complain about that sort of performance gap in practice - it is a capable model. Looking at the spread of scores I don't immediately see why someone even needs to use something in the Top 10, presumably anything above 1363 would be good enough for business, research and personal use.

None of these models have even been around that long, Deepseek was only released in January. The rate of change is massive, I expect to have access to an open source model that is better than anything on this leaderboard next year some time.

[0] https://2700chess.com/