| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by malisper 444 days ago
	Other models aren't able to solve it so there's something else happening besides it being in the training data. You can also vary the problem and give it a number like 85 instead of 65 and Gemini is still able to properly reason through the problem

4 comments

lolinder 444 days ago

I'm sure you're right that it's more than just it being in the training data, but that it's in the training data means that you can't draw any conclusions about general mathematical ability using just this as a benchmark, even if you substitute numbers.

There are lots of possible mechanisms by which this particular problem would become more prominent in the weights in a given round of training even if the model itself hasn't actually gotten any better at general reasoning. Here are a few:

* Random chance (these are still statistical machines after all)

* The problem resurfaced recently and shows up more often than it used to.

* The particular set of RLHF data chosen for this model draws out the weights associated with this problem in a way that wasn't true previously.

link

mrtesthah 443 days ago

Google Gemini 2.5 is able to search the web, so if you're able to find the answer on reddit, maybe it can too.

link

mattkevan 443 days ago

I think there’s a big push to train LLMs on maths problems - I used to get spammed on Reddit with ads for data tagging and annotation jobs.

Recently these have stopped and they’re now the ads are about becoming a maths tutor to AI.

Doesn’t seem like a role with long-term prospects.

link

7e 444 days ago

Sure, but you can't cite this puzzle as proof that this model is "better than 95+% of the population at mathematical reasoning" when the method of solving (the "answer") it is online, and the model has surely seen it.

link

stabbles 443 days ago

It gets it wrong when you give it 728. It claims (728, 182, 546). I won't share the answer so it won't appear in the next training set.

link

WithinReason 443 days ago

with 728 the puzzle doesn't work since it's divisible by 8

link

eru 443 days ago

But then the AI should tell you that, too, if it really understand the problem?

link

stabbles 443 days ago

Fair, the question is what possible solutions exists.

link