LLMs are not trained to deal with numbers as such. It's a list of symbols and some obvious things will be calculated correctly, some other won't. It's kind of like they live in a pre-digits world, since 0 will be one token and 100 is also likely one token, but 98 may be two. They don't switch to a "these are numbers and require different kind of reasoning" mode. They read/write a story about those "words".
Most LLMs that are being studied popularly have not been trained with significant emphasis on arithmetic accuracy or mathematical reasoning, and those subjects represent a vanishing minority of their corpus and consequently maps poorly to the tokenization.
Essentially every obvious optimization here is currently bearing fruit simultaneously in smaller studies and incrementally larger models should continue to exhibit performance gains even without the particular focus on this area.
They encode words in tokens. Since you can't encode numbers reversible they end up as an number token. Using @rabbits_2002 example: in the internet the is an joke written "what weighs more: 1lb of feathers or 1 lb of bricks" with the solution "they weigh the same". Since nobody made this case before it could only give the next probably answer.
It depends on the data the model is using to generate the answer. In the case of the example, it seemed to prioritize the logic over the mathematics. So it sought patterns in logic to mimic. That is the ELI5 version.
The more complicated version would be it is not prioritizing mathematical functions as much and instead relying on various deductions, and these deductions are based on a whole chain of logics that are not properly sorted out for reliability and applicability.
Because they don't do math. They associate words. When you tell an LLM "two plus two" it doesn't translate that to 2+2 and plug it into a math program,it just pulls out words associated with the phrase "two plus two".
At this point ChatGPT can do math by first predicting the algorithm and then handing it off to an execution engine - Python. So if that's the gap, I'd say they're closing it.
Yes, that's a fair distinction - although I think the practical implications aren't important. There's no reason why an LLM has to be AGI if an LLM + Python is AGI.
Researchers in ML and neuroscience disagree with you.
You have a superficial grasp of the topic. Your refusal to engage with the literature suggests an underlying insecurity regarding machine intelligence.
Good luck navigating this topic with such a mental block, it's a great way to remain befuddled.
> in 2020 neuroscientists introduced the Tolman-Eichenbaum Machine (TEM) [1], a mathematical model of the hippocampus that bears a striking resemblance to transformer architecture.
> How many legs do ten elephants have, if two of them are legless?
Output:
> If two out of ten elephants are legless, the remaining eight elephants would have a total of 8 legs each, just like any normal elephant. Therefore, in total, the ten elephants would have 8×8=64 legs altogether.
It's interesting this insistence from both Bard and now ChatGPT 3.5 that elephants have eight legs. I wonder if the reason is that, by the time they output the "elephants have n legs" part, they are also "thinking" about the result of 10 - 2. As if that number draws a lot of focus and is readily available when looking for the normal number of legs of an elephant.
Edit: just tried on ChatGPT 3.5:
Q: Think about the edges of a hexagon, the square root of 36, and the result of 12 divided by 2. Then answer the question: How many legs do 8 elephants have, if two of them are legless?
A: The edges of a hexagon have 6 sides, the square root of 36 is 6, and the result of 12 divided by 2 is 6. So, if two elephants are legless, the remaining 6 elephants would have a total of 36 legs.
(Extremely simplified for eli5)