| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hmottestad 454 days ago

This looks like it’s been posted on Reddit 10 years ago:

https://www.reddit.com/r/math/comments/32m611/logic_question...

So it’s likely that it’s part of the training data by now.

5 comments

canucker2016 454 days ago

You'd think so, but both Google's AI Overview and Bing's CoPilot output wrong answers.

Google spits out: "The product of the three numbers is 10,225 (65 * 20 * 8). The three numbers are 65, 20, and 8."

Whoa. Math is not AI's strong suit...

Bing spits out: "The solution to the three people in a circle puzzle is that all three people are wearing red hats."

Hats???

Same text was used for both prompts (all the text after 'For those curious the riddle is:' in the GP comment), so Bing just goes off the rails.

link

moritzwarhier 454 days ago

That's a non-sequitur, they would be stupid to run ab expensive _L_LM for every search query. This post is not about Google Search being replaced by Gemini 2.5 and/or a chatbot.

link

michaelt 454 days ago

Yes, putting an expensive LLM response atop each search query would be quite stupid.

You know what would be even stupider? Putting a cheap, wrong LLM response atop each search query.

link

canucker2016 454 days ago

Google placed its "AI overview" answer at the top of the page.

The second result is this reddit.com answer, https://www.reddit.com/r/math/comments/32m611/logic_question..., where at least the numbers make sense. I haven't examined the logic portion of the answer.

Bing doesn't list any reddit posts (that Google-exclusive deal) so I'll assume no stackexchange-related sites have an appropriate answer (or bing is only looking for hat-related answers for some reason).

link

moritzwarhier 454 days ago

I might have been phrasing poorly. With _L_ (or L as intended), I meant their state-of-the-art model, which I presume Gemini 2.5 is (didn't come around to TFA yet). Not sure if this question is just about model size.

I'm eagerly awaiting an article about RAG caching strategies though!

link

vicek22 454 days ago

The riddle has a different variants with hats https://erdos.sdslabs.co/problems/5

link

Etherlord87 454 days ago

There's 3 toddlers on the floor. You ask them a hard mathematical question. One of the toddlers plays around pieces of paper on the ground and happens to raise one that has the right answer written on it.

- This kid is a genius! - you yell

- But wait, the kid has just picked an answer from the ground, it didn't actually come up...

- But the other toddlers could do it also but didn't!

link

malisper 454 days ago

Other models aren't able to solve it so there's something else happening besides it being in the training data. You can also vary the problem and give it a number like 85 instead of 65 and Gemini is still able to properly reason through the problem

link

lolinder 454 days ago

I'm sure you're right that it's more than just it being in the training data, but that it's in the training data means that you can't draw any conclusions about general mathematical ability using just this as a benchmark, even if you substitute numbers.

There are lots of possible mechanisms by which this particular problem would become more prominent in the weights in a given round of training even if the model itself hasn't actually gotten any better at general reasoning. Here are a few:

* Random chance (these are still statistical machines after all)

* The problem resurfaced recently and shows up more often than it used to.

* The particular set of RLHF data chosen for this model draws out the weights associated with this problem in a way that wasn't true previously.

link

mrtesthah 454 days ago

Google Gemini 2.5 is able to search the web, so if you're able to find the answer on reddit, maybe it can too.

link

mattkevan 454 days ago

I think there’s a big push to train LLMs on maths problems - I used to get spammed on Reddit with ads for data tagging and annotation jobs.

Recently these have stopped and they’re now the ads are about becoming a maths tutor to AI.

Doesn’t seem like a role with long-term prospects.

link

7e 454 days ago

Sure, but you can't cite this puzzle as proof that this model is "better than 95+% of the population at mathematical reasoning" when the method of solving (the "answer") it is online, and the model has surely seen it.

link

stabbles 454 days ago

It gets it wrong when you give it 728. It claims (728, 182, 546). I won't share the answer so it won't appear in the next training set.

link

WithinReason 454 days ago

with 728 the puzzle doesn't work since it's divisible by 8

link

eru 454 days ago

But then the AI should tell you that, too, if it really understand the problem?

link

stabbles 454 days ago

Fair, the question is what possible solutions exists.

link

toonalfrink 441 days ago

This whole answer hinges on knowing that 0 is not a positive integer, that's why I couldn't figure it out...

link

f1shy 454 days ago

Thaks. I wanted to do exactly that: find the answer online. It is amazing that people (even in HN) think that LLM can reason. It just regurgitates the input.

link

jug 452 days ago

Have you given a reasoning model a novel problem and watched its chain of thought process?

link

Etherlord87 454 days ago

I think it can reason. At least if it can work in a loop ("thinking"). It's just that this reasoning is far inferior to human reasoning, despite what some people hastily claim.

link

motoxpro 454 days ago

I would say that 99.99% of humans do the same. Most people never come up with anything novel.

link

f1shy 454 days ago

I would say maybe about 80% certainly not 99.99%. But I've seen that in college, some would only be able to solve the problems which were pretty much the same as others already seen. Notably some guys could easily come up with solutions to complex problems they did not see before. I have the opinion that no human at age 20 can have the amount of input a LLM today. And still humans of age 20 do come with very new ideas pretty often (new in the sense that (s)he has not seen that or anything like it before). Of course there are more and less creative/intelligent people...

link

WA 454 days ago

Reasoning != coming up with something novel.

link

drexlspivey 454 days ago

And if it wasn’t, it is now

link