Hacker News new | ask | show | jobs
by dimask 735 days ago
> How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems. During my mathematics education I had to practice solving a lot of problems dissimilar what I had seen before. Even in the theory part, a lot of it was actually about filling in details in proofs and arguments, and reformulating challenging steps (by words or drawings). My notes on top of a mathematical textbook are much more than the text itself.

People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions. The original article is spot on that there is no AGI pathway in the current research direction. But there are huge incentives for ignoring this.

4 comments

> Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems.

I think it's more accurate to say that they learn math by memorizing a sequence of steps that result in a correct solution, typically by following along with some examples. Hopefully they also remember why each step contributes to the answer as this aids recall and generalization.

The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly. This is just standard training. Understanding the motivation of each step helps with that memorization, and also allows you to apply that step in novel problems.

> The original article is spot on that there is no AGI pathway in the current research direction.

I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits for problem solving if trained enough, and parametric memory generalizes their operation to many more tasks.

They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Composition tasks are still challenging, but parametric memory is a big step in the right direction for that too. Accurate comparitive and compositional reasoning sound tantalizingly close to AGI.

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.

While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).

> I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits

It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

From: https://arxiv.org/abs/2405.15071

> The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

> Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes

Everyone starts by memorizing how to do basic arithmetic on numbers, their multiplication tables and fractions. Only some then advance to understanding why those operations must work as they do.

> It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

Yes, I acknowledged that when I said "Composition tasks are still challenging". Comparisons and composition are both key to abstract reasoning. Clearly parametric memory and grokking have shown a fairly dramatic improvement in comparative reasoning with only a small tweak.

There is no evidence to suggest that compositional reasoning would not also fall to yet another small tweak. Maybe it will require something more dramatic, but I wouldn't bet on it. This pattern of thinking humans are special does not have a good track record. Therefore, I find the original claim that I was responding to("there is no AGI pathway in the current research direction") completely unpersuasive.

I started by understanding. I could multiply by repeat addition (each addition counted one at a time with the aid of fingers) before I had the 10x10 addition table memorized. I learned university level calculus before I had more than half of the 10x10 multiplication table memorized, and even that was from daily use, not from deliberate memorization. There wasn't a day in my life where I could recite the full table.

Maybe schools teach by memorization, but my mom taught me by explaining what it means, and I highly recommend this approach (and am a proof by example that humans can learn this way).

> I started by understanding. I could multiply by repeat addition

How did you learn what the symbols for numbers mean and how addition works? Did you literally just see "1 + 3 = 4" one day and intuit the meaning of all of those symbols? Was it entirely obvious to you from the get-go that "addition" was the same as counting using your fingers which was also the same as counting apples which was also the same as these little squiggles on paper?

There's no escaping the fact that there's memorization happening at some level because that's the only way to establish a common language.

There's a difference between memorizing meanings of words (addition is same as counting this and then the other thing, "3" means three things) and memorizing methods (table of single digit addition/multiplication to do them faster in your head). You were arguing the second, I'm a counterexample. I agree about the first, everyone learns language by memorization (some rote, some by use), but language is not math.
The point is the memorization exercise requires orders of magnitude fewer examples for bootstrapping.
Does it though? It's a common claim but I don't think that's been rigourously established.
> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Perhaps that is how you learned math, but it is nothing like how I learned math. Memorizing steps does not help, I sucked at it. What works for me us understanding the steps and why we used them. Once I understood the process and why it worked, I was able to reason my way through it.

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly.

Did you look at the types of problems presented by the ARC-AGO test? I don't see how memorization plays any role.

> They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Then lets see how they do on the ARC test? While it is possible that generalized circuits can develop in Ls with enough training but I am pretty skeptical till we see results.

> Perhaps that is how you learned math, but it is nothing like how I learned math.

Memorization is literally how you learned arithmetic, multiplication tables and fractions. Everyone starts learning math by memorization, and only later start understanding why certain steps work. Some people don't advance to that point, and those that do become more adept at math.

> Memorization is literally how you learned arithmetic, multiplication tables and fractions

I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure". Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

> I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure"

What did you understand, exactly? You understood how to "count" using "numbers" that you also memorized? You intuitively understood that addition was counting up and subtraction was counting down, or did you memorize those words and what they meant in reference to counting?

> Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

The procedure to add or subtract fractions by establishing a common denominator, for instance. The procedure for how numerators and denominators are multiplied or divided. I could go on.

Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

I do have the single digit multiplication table memorized now, but there was a long time where that table had gaps and I would use my understanding of how numbers worked to to calculate the result rather than remembering it. That same process still occurs for double digit number.

Mathematics education, especially historically, has indeed leaned pretty heavily on memorization. That does mean thats the only way to learn math, or even a particularly good one. I personally think over reliance on memorization is part of why so many people think they hate math.

Every time I see people online reduce the human thinking process to just production of a perceptible output, I start questioning myself, whether somehow I am the only human on this planet capable of thinking and everyone else is just pretending. That can't be right. It doesn't add up.

The answer is that both humans and the model are capable of reasoning, but the model is more restricted in the reasoning that it can perform since it must conform to the dataset. This means the model is not allowed to invest tokens that do not immediately represent an answer but have to be derived on the way to the answer. Since these thinking tokens are not part of the dataset, the reasoning that the LLM can perform is constrained to the parts of the model that are not subject to the straight jacket of training loss. Therefore most of the reasoning occurs in-between the first and last layers and ends with the last layer, at which point the produced token must cross the training loss barrier. Tokens that invest into the future but are not in the dataset get rejected and thereby limit the ability of the LLM to reason.

> People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions

And almost all of it is just more text, or described in more text.

You're very much right about this. And that's exactly why LLMs work as well as they do - they're trained on enough text of all kinds and topics, that they get to pick up on all kinds of patterns and relationships, big and small. The meaning of any word isn't embedded in the letters that make it, but in what other words and experiences are associated with it - and it so happens that it's exactly what language models are mapping.

It is not "just more text". That is an extremely reductive approach on human cognition and experience that does favour to nothing. Describing things in text collapses too many dimensions. Human cognition is multimodal. Humans are not computational machines, we are attuned and in constant allostatic relationship with the changing world around us.
I think there is a component of memorizing solutions. For example, for mathematical proofs there is a set of standard "tricks" that you should have memorized.
Sure memory helps a lot, it allows you to concentrate your mental effort on the novel ot unique parts of the problem.