| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zurfer 88 days ago

"In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh)."

I find that very surprising. This problem seems out of reach 3 months ago but now the 3 frontier models are able to solve it.

Is everybody distilling each others models? Companies sell the same data and RL environment to all big labs? Anybody more involved can share some rumors? :P

I do believe that AI can solve hard problems, but that progress is so distributed in a narrow domain makes me a bit suspicious somehow that there is a hidden factor. Like did some "data worker" solve a problem like that and it's now in the training data?

3 comments

mike_hearn 88 days ago

Yes there's a whole ecosystem of companies that create and sell RL gyms to AI labs and of course they develop their own internally too. You don't hear much about this ecosystem because RL at scale is all private. Nearly no academic research on it.

A lot of this is probably just throwing roughly equal amounts of compute at continuous RLVR training. I'm not convinced there's any big research breakthrough that separates GPT 5.4 from 5.2. The diff is probably more than just checkpoints but less than neural architecture changes and more towards the former than the latter.

I think it's just easy to underestimate how much impact continuous training+scaling can have on the underlying capabilities.

link

slopinthebag 88 days ago

Is it possible the AI labs are seeding their models with these solved problems? Like, if I was Sam Altman with a bazillion dollars of investment I would pay some mathematicians to solve some of these problems so that the models could "solve" them later on. Not that I think it's what's happening here of course...

But it is pretty funny how 5.4 miscounted the number of 1's in 18475838184729 on the same day it solved this.

link

mrtesthah 87 days ago

Maybe so, but GPT 5.4 is absolutely pulling ahead. You can see the differences visually on https://minebench.ai/.

link