Hacker News new | ask | show | jobs
by francesco 495 days ago
It looks to me like the performance reported for rStar-Math (both, in the table and in the graph) is incorrect. With a single rollout (at test time) rStar-Math achieves 50 in AIME 2024 not 26.7 as you reported. In Olympiad Bench it achieves 65.3, not 47.1 as you reported. In AMC 2023 it achieves 87.5, not 47.5 as your reported. It outperforms your model across the board. Am I reading something incorrectly?