In related news, OpenAI and Google have announced that their latest non-public models have received Gold in the International Mathematics Olympiad: https://news.ycombinator.com/item?id=44614872
That said, the public models don't even get bronze.
Wow. That's an impressive result, though we definitely need some more details on how it was achieved.
What techniques were used? He references scaling up test-time compute, so I have to assume they threw a boatload of money at this. I've heard talk of running models in parallel and comparing results - if OpenAI ran this 10000 times in parallel and cherry-picked the best one, this is a lot less exciting.
If this is legit, then I really want to know what tools were used and how the model used them.
What techniques were used? He references scaling up test-time compute, so I have to assume they threw a boatload of money at this. I've heard talk of running models in parallel and comparing results - if OpenAI ran this 10000 times in parallel and cherry-picked the best one, this is a lot less exciting.
If this is legit, then I really want to know what tools were used and how the model used them.