Hacker News new | ask | show | jobs
by falcor84 131 days ago
How is "uncheatable"? If you know the exact olympiad questions it's being assessed on, what's stopping you from massaging it until it gets more of them right than the previous number 1?
1 comments

MathArena uses newly released competition sets and evaluates models close to the event. They also mark models released after the competition date as potential contamination.

On Feb 6, the just-concluded AIME 2026 I, Step 3.5 Flash take first place. Step 3.5 Flash was released on Feb 1, making cheating impossible.