Hacker News new | ask | show | jobs
by canucker2016 454 days ago
You'd think so, but both Google's AI Overview and Bing's CoPilot output wrong answers.

Google spits out: "The product of the three numbers is 10,225 (65 * 20 * 8). The three numbers are 65, 20, and 8."

Whoa. Math is not AI's strong suit...

Bing spits out: "The solution to the three people in a circle puzzle is that all three people are wearing red hats."

Hats???

Same text was used for both prompts (all the text after 'For those curious the riddle is:' in the GP comment), so Bing just goes off the rails.

3 comments

That's a non-sequitur, they would be stupid to run ab expensive _L_LM for every search query. This post is not about Google Search being replaced by Gemini 2.5 and/or a chatbot.
Yes, putting an expensive LLM response atop each search query would be quite stupid.

You know what would be even stupider? Putting a cheap, wrong LLM response atop each search query.

Google placed its "AI overview" answer at the top of the page.

The second result is this reddit.com answer, https://www.reddit.com/r/math/comments/32m611/logic_question..., where at least the numbers make sense. I haven't examined the logic portion of the answer.

Bing doesn't list any reddit posts (that Google-exclusive deal) so I'll assume no stackexchange-related sites have an appropriate answer (or bing is only looking for hat-related answers for some reason).

I might have been phrasing poorly. With _L_ (or L as intended), I meant their state-of-the-art model, which I presume Gemini 2.5 is (didn't come around to TFA yet). Not sure if this question is just about model size.

I'm eagerly awaiting an article about RAG caching strategies though!

The riddle has a different variants with hats https://erdos.sdslabs.co/problems/5
There's 3 toddlers on the floor. You ask them a hard mathematical question. One of the toddlers plays around pieces of paper on the ground and happens to raise one that has the right answer written on it.

- This kid is a genius! - you yell

- But wait, the kid has just picked an answer from the ground, it didn't actually come up...

- But the other toddlers could do it also but didn't!