Hacker News new | ask | show | jobs
by roenxi 317 days ago
> Followed by a picture that is more or less inscrutable.

Yeah. Just to make it explicit - that chart has Deepseek r1 at ... presumably an elo of 1418 and Gemini Pro at 1463. That is comparable to the gap between Magnus Carlsen and Fabiano Caruana [0]. I don't think it is reasonable to complain about that sort of performance gap in practice - it is a capable model. Looking at the spread of scores I don't immediately see why someone even needs to use something in the Top 10, presumably anything above 1363 would be good enough for business, research and personal use.

None of these models have even been around that long, Deepseek was only released in January. The rate of change is massive, I expect to have access to an open source model that is better than anything on this leaderboard next year some time.

[0] https://2700chess.com/