|
|
|
|
|
by bluecoconut
637 days ago
|
|
oh, they do talk about it On the 2024 AIME exams, GPT-4o only solved on average 12% (1.8/15) of problems. o1 averaged 74% (11.1/15) with a single sample per problem, 83% (12.5/15) with consensus among 64 samples, and 93% (13.9/15) when re-ranking 1000 samples with a learned scoring function. A score of 13.9 places it among the top 500 students nationally and above the cutoff for the USA Mathematical Olympiad.
showing that as they increase the k of ensemble, they can continue to get it higher. All the way up to 93% when using 1000 samples. |
|