|
|
|
|
|
by diyer22
127 days ago
|
|
MathArena uses newly released competition sets and evaluates models close to the event. They also mark models released after the competition date as potential contamination. On Feb 6, the just-concluded AIME 2026 I, Step 3.5 Flash take first place. Step 3.5 Flash was released on Feb 1, making cheating impossible. |
|