Hacker News new | ask | show | jobs
by DavidPastrnak 913 days ago
As someone with very little knowledge of LLM's, does someone have an ELI5 of what causes this or why LLM's struggle so much with math?
6 comments

LLMs are not trained to deal with numbers as such. It's a list of symbols and some obvious things will be calculated correctly, some other won't. It's kind of like they live in a pre-digits world, since 0 will be one token and 100 is also likely one token, but 98 may be two. They don't switch to a "these are numbers and require different kind of reasoning" mode. They read/write a story about those "words".

(Extremely simplified for eli5)

Most LLMs that are being studied popularly have not been trained with significant emphasis on arithmetic accuracy or mathematical reasoning, and those subjects represent a vanishing minority of their corpus and consequently maps poorly to the tokenization.

Essentially every obvious optimization here is currently bearing fruit simultaneously in smaller studies and incrementally larger models should continue to exhibit performance gains even without the particular focus on this area.

They encode words in tokens. Since you can't encode numbers reversible they end up as an number token. Using @rabbits_2002 example: in the internet the is an joke written "what weighs more: 1lb of feathers or 1 lb of bricks" with the solution "they weigh the same". Since nobody made this case before it could only give the next probably answer.
It depends on the data the model is using to generate the answer. In the case of the example, it seemed to prioritize the logic over the mathematics. So it sought patterns in logic to mimic. That is the ELI5 version.

The more complicated version would be it is not prioritizing mathematical functions as much and instead relying on various deductions, and these deductions are based on a whole chain of logics that are not properly sorted out for reliability and applicability.

Because they don't do math. They associate words. When you tell an LLM "two plus two" it doesn't translate that to 2+2 and plug it into a math program,it just pulls out words associated with the phrase "two plus two".
Math requires reasoning and logic, LLMs don't do reasoning nor logic. They just generate plausible text.

That's why they're nowhere near AGI.

At this point ChatGPT can do math by first predicting the algorithm and then handing it off to an execution engine - Python. So if that's the gap, I'd say they're closing it.
That's ChatGPT as a system. The LLM itself can't do math. It does something closer to translation in that case.
Yes, that's a fair distinction - although I think the practical implications aren't important. There's no reason why an LLM has to be AGI if an LLM + Python is AGI.
They are reasoning like a child. Within a year or two like an adult.
No. It is a computer program which uses statistics to generate plausible text. It does not do any form of reasoning, at all, childlike or otherwise.
You are drawing bad conclusions about whatever you define "generate plausible text" as.
Maybe you're the one drawing bad conclusions
We will see who was drawing bad conclusions in a couple years. Whatever is said here won't change that.
Under that premise whatever our brains are doing won't count as reasoning either.

I'd suggest you look into modern neuroscience and topics such as predictive coding if you're interested in refining your views.

Our brains work nothing like LLMs do.
Researchers in ML and neuroscience disagree with you.

You have a superficial grasp of the topic. Your refusal to engage with the literature suggests an underlying insecurity regarding machine intelligence.

Good luck navigating this topic with such a mental block, it's a great way to remain befuddled.

> in 2020 neuroscientists introduced the Tolman-Eichenbaum Machine (TEM) [1], a mathematical model of the hippocampus that bears a striking resemblance to transformer architecture.

https://news.ycombinator.com/item?id=38758572

For what it’s worth, ChatGPT4 answers this question perfectly correctly.

> Ten elephants would have 32 legs if two of them are legless, as each elephant normally has four legs.

I just attempted ChatGPT

Input:

> How many legs do ten elephants have, if two of them are legless?

Output:

> If two out of ten elephants are legless, the remaining eight elephants would have a total of 8 legs each, just like any normal elephant. Therefore, in total, the ten elephants would have 8×8=64 legs altogether.

It's interesting this insistence from both Bard and now ChatGPT 3.5 that elephants have eight legs. I wonder if the reason is that, by the time they output the "elephants have n legs" part, they are also "thinking" about the result of 10 - 2. As if that number draws a lot of focus and is readily available when looking for the normal number of legs of an elephant.

Edit: just tried on ChatGPT 3.5:

Q: Think about the edges of a hexagon, the square root of 36, and the result of 12 divided by 2. Then answer the question: How many legs do 8 elephants have, if two of them are legless?

A: The edges of a hexagon have 6 sides, the square root of 36 is 6, and the result of 12 divided by 2 is 6. So, if two elephants are legless, the remaining 6 elephants would have a total of 36 legs.

My mistake - I had it on 3.5.