|
|
|
|
|
by hooloovoo_zoo
197 days ago
|
|
For one thing, it's not a real score; they judged the results themselves and Putnam judges are notoriously tough. There was not a single 8 on the problem they claim partial credit for (or any partial credit above a 2) amongst the top 500 humans. https://kskedlaya.org/putnam-archive/putnam2024stats.html. For another thing, the 2024 Putnam problems are in their RL data. Also, it's very unclear how these competitions consisting of problems designed to have clear-cut answers and be solved by (well-prepared) humans in an hour will translate to anything else. |
|
Why do you think that the 2024 Putnam programs that they used to test were in the training data?
/? "Art of Problem Solving" Putnam https://www.google.com/search?q=%22Art+of+Problem+Solving%22...
From p.3 of the PDF:
> Curating Cold Start RL Data: We constructed our initial training data through the following process:
> 1. We crawled problems from Art of Problem Solving (AoPS) contests , prioritizing math olympiads, team selection tests, and post-2010 problems explicitly requiring proofs, total- ing 17,503 problems.