|
|
|
|
|
by RC_ITR
119 days ago
|
|
Here's the score for new AIME's, where we know the answers aren't in training. https://matharena.ai/?view=problem&comp=aime--aime_2026 As for MMLU, is your assertion that these AI labs are not correcting for errors in these exams and then self-reporting scores less than 100%? As implied by the video, wouldn't it then take 1 intern a week max to fix those errors and allow any AI lab to become the first to consistently 100% the MMLU? I can guarantee Moonshot, DeepSeek, or Alibaba would be all over the opportunity to do just that if it were a real problem. |
|