| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by naasking 735 days ago

> Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems.

I think it's more accurate to say that they learn math by memorizing a sequence of steps that result in a correct solution, typically by following along with some examples. Hopefully they also remember why each step contributes to the answer as this aids recall and generalization.

The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly. This is just standard training. Understanding the motivation of each step helps with that memorization, and also allows you to apply that step in novel problems.

> The original article is spot on that there is no AGI pathway in the current research direction.

I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits for problem solving if trained enough, and parametric memory generalizes their operation to many more tasks.

They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Composition tasks are still challenging, but parametric memory is a big step in the right direction for that too. Accurate comparitive and compositional reasoning sound tantalizingly close to AGI.

2 comments

Vetch 735 days ago

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.

While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).

> I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits

It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

From: https://arxiv.org/abs/2405.15071

> The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

link

naasking 735 days ago

> Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes

Everyone starts by memorizing how to do basic arithmetic on numbers, their multiplication tables and fractions. Only some then advance to understanding why those operations must work as they do.

> It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

Yes, I acknowledged that when I said "Composition tasks are still challenging". Comparisons and composition are both key to abstract reasoning. Clearly parametric memory and grokking have shown a fairly dramatic improvement in comparative reasoning with only a small tweak.

There is no evidence to suggest that compositional reasoning would not also fall to yet another small tweak. Maybe it will require something more dramatic, but I wouldn't bet on it. This pattern of thinking humans are special does not have a good track record. Therefore, I find the original claim that I was responding to("there is no AGI pathway in the current research direction") completely unpersuasive.

link

SonOfLilit 735 days ago

I started by understanding. I could multiply by repeat addition (each addition counted one at a time with the aid of fingers) before I had the 10x10 addition table memorized. I learned university level calculus before I had more than half of the 10x10 multiplication table memorized, and even that was from daily use, not from deliberate memorization. There wasn't a day in my life where I could recite the full table.

Maybe schools teach by memorization, but my mom taught me by explaining what it means, and I highly recommend this approach (and am a proof by example that humans can learn this way).

link

naasking 735 days ago

> I started by understanding. I could multiply by repeat addition

How did you learn what the symbols for numbers mean and how addition works? Did you literally just see "1 + 3 = 4" one day and intuit the meaning of all of those symbols? Was it entirely obvious to you from the get-go that "addition" was the same as counting using your fingers which was also the same as counting apples which was also the same as these little squiggles on paper?

There's no escaping the fact that there's memorization happening at some level because that's the only way to establish a common language.

link

SonOfLilit 734 days ago

There's a difference between memorizing meanings of words (addition is same as counting this and then the other thing, "3" means three things) and memorizing methods (table of single digit addition/multiplication to do them faster in your head). You were arguing the second, I'm a counterexample. I agree about the first, everyone learns language by memorization (some rote, some by use), but language is not math.

link

naasking 734 days ago

> You were arguing the second, I'm a counterexample.

I still don't think you are. Since we agree that you memorized numbers and how they are sequential, and that counting is moving "up" in the sequence, addition as counting is still memorizing a procedure based on this, not just memorizing a name: to add any two numbers, count down on one as you count up on the other until the first number number reaches zero, and the number that counted up is the sum. I'm curious how you think you learned addition without memorizing this procedure (or one equivalent to it).

Then you memorized the procedure for multiplication: given any two numbers, count down on one and add the other to itself until the counted down number reaches one. This is still a procedure that you memorized under the label "multiplication".

This is exactly the kind of procedure that I initially described. Someone taught you a correct procedure for achieving some goal and gave you a name for it, and "learning math" consists of memorizing such correct procedures (valid moves in the game of math if you will). These moves get progressively more sophisticated as the math gets more advanced, but it's the same basic process.

They "make sense" to you, and you call it "understanding", because they are built on a deep foundation that ultimately grounds out in counting, but it's still memorizing procedures up and down the stack. You're just memorizing the "minimum" needed to reproduce everything else, and compression is understanding [1].

The "variation in outcomes" that an OP discussed is simply because many valid moves are possible in any given situation, just like in chess, and if you "understand" when a move is valid vs. not (eg. you remember it), then you have an advantage over someone who just memorized specific shortcuts, which I suspect is what you are thinking I mean by memorization.

[1] https://philpapers.org/rec/WILUAC-2

link

11101010001100 735 days ago

The point is the memorization exercise requires orders of magnitude fewer examples for bootstrapping.

link

naasking 735 days ago

Does it though? It's a common claim but I don't think that's been rigourously established.

link

shkkmo 735 days ago

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Perhaps that is how you learned math, but it is nothing like how I learned math. Memorizing steps does not help, I sucked at it. What works for me us understanding the steps and why we used them. Once I understood the process and why it worked, I was able to reason my way through it.

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly.

Did you look at the types of problems presented by the ARC-AGO test? I don't see how memorization plays any role.

> They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Then lets see how they do on the ARC test? While it is possible that generalized circuits can develop in Ls with enough training but I am pretty skeptical till we see results.

link

naasking 735 days ago

> Perhaps that is how you learned math, but it is nothing like how I learned math.

Memorization is literally how you learned arithmetic, multiplication tables and fractions. Everyone starts learning math by memorization, and only later start understanding why certain steps work. Some people don't advance to that point, and those that do become more adept at math.

link

pedrosorio 735 days ago

> Memorization is literally how you learned arithmetic, multiplication tables and fractions

I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure". Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

link

naasking 735 days ago

> I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure"

What did you understand, exactly? You understood how to "count" using "numbers" that you also memorized? You intuitively understood that addition was counting up and subtraction was counting down, or did you memorize those words and what they meant in reference to counting?

> Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

The procedure to add or subtract fractions by establishing a common denominator, for instance. The procedure for how numerators and denominators are multiplied or divided. I could go on.

link

shkkmo 734 days ago

Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

I do have the single digit multiplication table memorized now, but there was a long time where that table had gaps and I would use my understanding of how numbers worked to to calculate the result rather than remembering it. That same process still occurs for double digit number.

Mathematics education, especially historically, has indeed leaned pretty heavily on memorization. That does mean thats the only way to learn math, or even a particularly good one. I personally think over reliance on memorization is part of why so many people think they hate math.

link

naasking 734 days ago

> Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

Sure, I did that plenty too, but that doesn't refute the point that memorization is core to understanding mathematics, it's just a specific kind of memorization that results maximal flexibility for minimal state retention. All you're claiming is that you memorized some core axioms/primitives and the procedures that operate on them, and then memorized how higher-level concepts are defined in terms of that core. I go into more detail of the specifics here:

https://news.ycombinator.com/item?id=40669585

I agree that this is a better way to memorize mathematics, eg. it's more parsimonious than memorizing lots of shortcuts. We call this type of memorizing "understanding" because it's arguably the most parsimonious approach, requiring the least memory, and machine learning has persuasively argued IMO that compression is understanding [1].

[1] https://philpapers.org/rec/WILUAC-2

link