Hacker News new | ask | show | jobs
by jhanschoo 757 days ago
> They're not able to reason, but we can't succintly define what it is.

For transformer-based LLMs, and most LLMs there's an obvious class of problems that they cannot solve. LLMs generally perform bounded computation per token, so they cannot reason about computational problems that are more than linearly complex, for a sufficiently large input instance. If you have a back-and-forth (many shot) your LLM can possibly utilize the context as state to solve harder problems, up to the context window, of course.

3 comments

Humans can realise they don’t understand something and seek more knowledge to learn to understand it. But also humans can build complex structures out of simple fundamentals: The same logic of counting up beans on a table can be extrapolated to multiplying that table of beans. And then counting horses the same way you count beans but give them a value of multiple beans. And then simplify that by trading in promises of beans in trade of horses.

The fact that so many people can’t see the fundamental differences of an LLM and human intelligence reminds me of back when the very early computer scientists thought they could model the entirety of nature by reducing every “component” to a numeric value and compute it as “transfer of energy”.

Quite literally they did the same thing: They had a new toy (very advanced computation machines) and forced all of nature to “fit” within it. It also ended in failure, obviously. Not because nature or ecosystems (as it was coined) are “magic” but because grossly oversimplifying reality to fit desired models is a fool’s errand.

We’ll have to wait and see how far multi modal training takes us. Text only models are extremely limited by the kind of information we can encode as text and the loss of detail e.g. the word “cat” vs an image of a cat vs video of a cat vs direct physical interaction with a cat vs being a mammal that shares a great deal of biology with a cat. You need a table and beans before you can invent a method for counting them
> LLMs generally perform bounded computation per token, so they cannot reason about computational problems that are more than linearly complex, for a sufficiently large input instance.

I can’t judge if this is true, because I don’t know transformers well, but if it is, it unravels an intuitive thought I’ve never been able to articulate about not only LLMs, but possibly all pattern matching and the human analog of System 1 thinking.

Another fuzzy way of saying this is there’s something irreducible about complexity that can’t be pattern matched by any bounded heuristic – that it’s wishful thinking to assume historical data contains hidden higher-level patterns that unlock magical shortcuts to novel problems.

> it’s wishful thinking to assume historical data contains hidden higher-level patterns that unlock magical shortcuts to novel problems

In the right context, why not? You rely on this everyday to navigate the world with more facility than a newborn.

Have you heard about the different formal notions of complexity and especially Kolmogorov complexity?

Humans have the same limitation and use same solution: showing your work and taking notes. There's no blocker here.
There is a distinction. Humans with the use of an unbounded scratchpad can emulate a general-purpose Turing machine and perform general computation given unbounded time. A LLM is still restricted to its context window which is a comparatively extreme limitation of memory. In comparison, our general-purpose computers have so much memory this isn't something we care about for most practical instances of hard problems that we solve with a classical CS algorithm. You can obviously modify LLMs to perform unbounded computation per token (and furnish it with a scratchpad) but afaict commercial LLMs today don't offer that.