| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by travelalberta 243 days ago

> LLMs can't do math.

Ignoring conversations about 'reasoning', at a fundamental level LLMs do not 'do math' in the way that a calculator or a human does math. Sure we can train bigger and bigger models that give you the impression of this but there are proofs out there that with increased task complexity (in this case multi-digit multiplication) eventually the probability of incorrect predictions converges to 1 (https://arxiv.org/abs/2305.18654)

> And your 2nd and third point about planning and compounding errors remain challenges.. probably unsolvable with LLM approaches.

The same issue applies here, really with any complex multi-step problem.

> Again, mere months later the o series of models came out, and basically proved this point moot. Turns out RL + long context mitigate this fairly well. And a year later, we have all SotA models being able to "solve" problems 100k+ tokens deep.

If you go hands on in any decent size codebase with an agent session length and context size become noticeable issues. Again, mathematically error propagation eventually leads to a 100% chance of error. Yann isn't wrong here, we've just kicked the can a little further down the road. What happens at 200k+ tokens? 500k+ tokens? 1M tokens? The underlying issue of a stochastic system isn't addressed.

>While Yann is clearly brilliant, and has a deeper understanding of the roots of the filed than many of us mortals, I think he's been on a debbie downer trend lately

As he should be. Nothing he said was wrong at a fundamental level. The transformer architecture we have now cannot scale with task complexity. Which is fine, by nature it was not designed for such tasks. The problem is that people see these models work on a subset of small scope complex projects and make claims that go against the underlying architecture. If a model is 'solving' complex or planning tasks but then fails to do similar tasks at a higher complexity it's a sign that there is no underlying deterministic process. What is more likely: the model is genuinely 'planning' or 'solving' complex tasks, or that the model has been trained with enough planning and task related examples that it can make a high probability guess?

> So, yeah, I'd take everything any one singular person says with a huge grain of salt. No matter how brilliant said individual is.

If anything, a guy like Yann with a role such as his at a Mag7 company being realistic (bearish if you are a LLM evangelist) about what the transformer architecture can do is a relief. I'm more inclined to listen to him than a guy like Altman who touts LLMs as the future of humanity meanwhile is path to profitability is AI Tik-Tok, sex chatbots, and a third party way to purchase things from Walmart during a recession.