Hacker News new | ask | show | jobs
by otabdeveloper4 13 days ago
> show me a 4th grade math problem they can't handle

Sure.

"8 7 6 5 4 3 2 1 - add minus signs and parenthesis to get 31."

P.S. There is an answer online and some LLMs will just copy it verbatim. This doesn't count.

3 comments

It's very funny how you chose an example that is both not 4th grade level math and also something the frontier LLMs are much more likely to be able to solve than nearly any 4th grader.

This is a counterexample to your argument, not evidence for your claim. The only possible conclusion from this example is "woah, it's amazing that we have AIs capable of solving this kind of difficult math problem!", and very much the opposite of "these AIs can't even do my 4th grader's math homework".

Whoa, 4th grade math problems got hard! I'm not sure how I'd tackle that one myself.
GPT-5.5 found a solution only after assuming that you're allowed to concatenate numbers together e.g. 8 7 becomes 87 (it complained at first that it was "under-specified") - using Python it brute-forced a solution (actually finding 13): https://chatgpt.com/share/6a1db54f-7ab8-8333-9218-86a469c284...

Are you sure this is 4th grade level?

I questioned OP's "there is an answer online" claim so I checked and the only source found for the original question was a 5th grade Russian school for mathematics.

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

Apparently there is a way to solve this without brute forcing all the combinations. It has to do with looking at how many even an odd numbers there are, and taking into account the goal number is odd. And then thinking through the combinations [even-even=even, even-odd=odd,…]

Though this is obviously not something I would expect a 4th grader to solve.

> 4th grade math problem

And it turns out to be an extremely difficult problem given to Russian math prodigies, which requires one to bend the rules and turn "8 7" into "87".

It's a standard "Russian math" problem. There's boatloads more where that came from, and none of them are solved by LLMs.