Hacker News new | ask | show | jobs
Open model StepFun-3.5 is #1 on MathArena, an uncheatable math benchmark (twitter.com)
3 points by diyer22 122 days ago
1 comments

How is "uncheatable"? If you know the exact olympiad questions it's being assessed on, what's stopping you from massaging it until it gets more of them right than the previous number 1?
MathArena uses newly released competition sets and evaluates models close to the event. They also mark models released after the competition date as potential contamination.

On Feb 6, the just-concluded AIME 2026 I, Step 3.5 Flash take first place. Step 3.5 Flash was released on Feb 1, making cheating impossible.