| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rosstaylor90 482 days ago
	What's your AIME 2025 score? https://gr.inc/RJT1990/AIME2025/

1 comments

nyrikki 482 days ago

The is the point of the AIME, it is a 3 hour closed book examination in which each answer is an integer number from 0 to 999 and should only depend on pre-calc...for a human with no calculator, notes, or internet access.

The concepts are heavily covered in the training corpus, and if people were allowed to take it more than once, with even a book let alone access to the internet it wouldn't be very hard.

Examples:

1) Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b.$

In the corpus: https://www.quora.com/In-what-bases-b-does-b-7-divide-into-9...

And one more:

3) https://artofproblemsolving.com/wiki/index.php/2025_AIME_I_P...

Is just the the number of ways to distribute k indistinguishable balls (players) into n distinguishable boxes (flavors, without exclusion, in such a way that no box is empty.

Thus in the corpus for any courses that need to cover combinatorial problems including physics, discreet math, logistics etc...

IMHO these concept classes from a typical AIME are so common, the scores you gave demonstrate that those models are doing no "general reasoning" at all and are actually failing at approximate retrieval.

link

rosstaylor90 482 days ago

I disagree, 10 years ago AIs nailing these types of competition would have been seen as very impressive. The fact goal posts can move on this now shows how much AI has progressed.

(Also the term “approximate retrieval” is a bad one - reasoning is inherently a process of chaining together associations. What matters is whether the reasoning reaches the right conclusions. Still some way to go, but already very impressive in tasks traditionally considered harbours of human reasoning!)

link

CamperBob2 481 days ago

I disagree, 10 years ago AIs nailing these types of competition would have been seen as very impressive.

It would have been seen as witchcraft.

link

bossyTeacher 481 days ago

> What matters is whether the reasoning reaches the right conclusions

no, it doesn't. a broken clock is right twice a day, reasoning is about the journey more than the destination

link

rosstaylor90 481 days ago

RL has more than two steps...

link

bossyTeacher 480 days ago

Point is that reasoning is more about the conclusions. if your steps are wrong, your reasoning is wrong regardless of the conclusion. Poor reasoning is what could make an LLM conclude that 1 + 2 = 3 but what 2 + 1 = [some number other than 3]

link