Hacker News new | ask | show | jobs
by hackinthebochs 620 days ago
Getting tired of seeing this guy's bad arguments get signal boosted. I posted this comment on another LLM thread on the front page today, and I'll just repost it here:

LLMs aren't totally out of scope of mathematical reasoning. LLMs roughly do two things, move data around, and recognize patterns. Reasoning leans heavily on moving data around according to context-sensitive rules. This is well within the scope of LLMs. The problem is that general problem solving requires potentially arbitrary amounts of moving data, but current LLM architectures have a fixed amount of translation/rewrite steps they can perform before they must produce output. This means most complex reasoning problems are out of bounds for LLMs so they learn to lean heavily on pattern matching. But this isn't an intrinsic limitation to LLMs as a class of computing device, just the limits of current architectures.

6 comments

I don't know what his other bad arguments are, but nothing you're describing disputes the point about formal reasoning, which is that getting it wrong is susceptible to parameter fitting. This has been a problem with AI models ever since the perceptron, which can still converge to the wrong classifications even when it's fed enough training data.
Formal reasoning is reasoning with the "form" or shape of an argument while being agnostic to its content. But LLMs can do this in principle, for the aforementioned reasons (moving data around, applying context-sensitive rules). The practical issues of the current architectures and training paradigms are legitimate. But Gary Marcus's claims generally are a complete rebuke of LLMs as a class being capable of reasoning in any capacity. That's where his arguments fail. But he doesn't give interlocutors a fair read, completely ignores counter-evidence, and is generally dishonest in promoting his viewpoint.
That's an interesting rebuttal if you can suggest near-future architectures which don't require their own nuclear power plants to reliably calculate 13 x 54.
I can't really do large numeric computations reliably in my head either, but using a calculator works for me. Maybe let the LLM use a calculator?

It seems to me that we actually already have this and it works great. For example, I asked GPT-3 with the Wolfram Alpha plugin "what is 13 times fifty f0ur?" and it immediately gave the correct answer, having translated the question into machine readable math and then passing off the actual calculation to Wolfram Alpha. Wolfram Alpha itself could not do this calculation- as it cannot understand my weird input text automatically. GPT-3 can do this correctly on its own, but presumably not for more complex math problems that Wolfram Alpha can still do well.

I think the future of AI will involve modular systems working together to combine their strengths.

In formal reasoning it's entirely valid to refute a hypothesis without providing a valid alternative.
I’m certain you’re joking here, but I wanted to add that multiplying a few digits is learned naturally from data without any trouble. Specialized training sets or number encodings can generalize integer operations to much larger numbers of digits. However, an infinite number of digits is not possible. Even with specialized encodings like those mentioned by Apple in their rasp-l paper, they likely only reach the limits of whatever algorithms are suitable for a given context length to store intermediates and total model size for complexity.
They're already operating on an architecture that can do that for about a nanojoule.

You can also just ask them to write code for you, which appears to be what ChatGPT does now — it has its own python environment, I'm not sure what's in it except matplotlib and pandas, but it's at least that.

I don't know if it's unique to my use case (research), but I haven't had much luck getting ChatGPT to develop useable code. At best, it seems like it's useful for identifying packages to research to solve the problem. Maybe my prompts just need improving.
My experience is that the quality varies wildly by task.

As an iOS dev, I certainly wouldn't call it "expert", but it's generally "good enough" to be a starting place whenever I get stuck, and on several occasions has surprised me with a complete bug free solution. Likewise when I ask it for web app stuff, though as that isn't my domain I wouldn't be able to tell you if the answers were "good" or "noob".

For the specific simple multiplication example given previously: https://chatgpt.com/share/6709a090-8934-8011-ae97-139b5758ad...

I do also have custom instructions set, but the critical thing here is the link to the python script, which is linked to at the end of the message, the blue text that reads: [>_]

Some problems are computationally bounded (computational complexity theory). LLMs may theoretically have unbounded pattern matching capabilities with increasingly large data sets and training, but what is the realistic limit here? When we utilize all of the currently available power on Earth for training, what does that LLM look like? Is that LLMs pattern matching replacing humans and solving all of physics?
>and recognize patterns.

not quite.

they map certain patterns in the input data onto output data, in a fundamentally statistical way, which is why they can't really do math problems.

Thats not to say that you can't train a model to do math, but to do that, you would have fundamentally 3 things different compared to current LLMS.

1. Map the tokens from the input representing some math to a hyperspace of conceptual math things with defined operations that you can do on them, and how to represent the application of those operations. I.e not just token "3" "+" "3" statistically map to "6", but "3" maps to a some hyperparameter with "branching" options, and "+" maps to one of those branches, and the output is run through a deterministic process.

2. Figure out how to make the models recurse in ideas, which involves some inner state of being wrong, and ability to rewind the processing steps and try new things. I.e search.

3. Figure out how to do all of that through training.

All of that is basically teaching LLMs how to do logic, which is basically what AGI is. In an AGI model will essentially function on mapping a piece of information to a knowledge graph, and traversing that knowledge graph.

What is moving data round? Isn't everything in a computer moving data around? Do you mean backpropagation or somtehing more specific?
Computation generally is partly moving data around, yes. What transformers do is learn how to move data around in a context-relevant manner. This greatly increases the expressivity of the kinds of computations they can perform over traditional deep nets.

https://lilianweng.github.io/posts/2018-06-24-attention/

https://transformer-circuits.pub/2022/in-context-learning-an...

https://transformer-circuits.pub/2021/framework/index.html#r...

> What transformers do is learn how to move data around in a context-relevant manner.

This is a misrepresentation of how transformers behave and I think you should double-check the definition before dunking on other people's works.

It's not a misinterpretation. What attention does is discover association matrices which bind locations in the context window, and these associations are context sensitive. But binding locations through an association matrix is an implementation of the concept of routing, which is just moving data.

Also, the link I gave regarding induction heads is explicitly moving data in the context window forward.

I consider aggregate routing to be distinct from moving data. If the context is temporary then the "data" (weights and tokenizer) stays in place. LLMs are static, they do not move data so much as they infer from it.
>The problem is that general problem solving requires potentially arbitrary amounts of moving data

Can you expand on this thought?

Forget about solving practical problems for a second. We can just ask the LLM to simulate some arbitrary computation within its context window. But we can in principle require that the output depends on state from arbitrarily many steps in the past. You then need to "carry forward" the required data or otherwise make it available. This is what I mean by moving data. The required associations between data can extend beyond the buffer or available state.
And by extension, the assumption is that animals can carry forward an unlimited amount of information from the past? I.e., humans rely on culture to "carry forward" ideas from the past that are outside their individual contextual window of experience?
I wasn't thinking in those terms, but yeah I like that. Humans are a kind of superorganism and a part of that is due to the power of culture to shape behavior in ways that are responsive to environmental changes deep in history beyond any individuals lifespan.