| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Vetch 735 days ago

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.

While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).

> I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits

It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

From: https://arxiv.org/abs/2405.15071

> The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

1 comments

naasking 735 days ago

> Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes

Everyone starts by memorizing how to do basic arithmetic on numbers, their multiplication tables and fractions. Only some then advance to understanding why those operations must work as they do.

> It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

Yes, I acknowledged that when I said "Composition tasks are still challenging". Comparisons and composition are both key to abstract reasoning. Clearly parametric memory and grokking have shown a fairly dramatic improvement in comparative reasoning with only a small tweak.

There is no evidence to suggest that compositional reasoning would not also fall to yet another small tweak. Maybe it will require something more dramatic, but I wouldn't bet on it. This pattern of thinking humans are special does not have a good track record. Therefore, I find the original claim that I was responding to("there is no AGI pathway in the current research direction") completely unpersuasive.

link

SonOfLilit 735 days ago

I started by understanding. I could multiply by repeat addition (each addition counted one at a time with the aid of fingers) before I had the 10x10 addition table memorized. I learned university level calculus before I had more than half of the 10x10 multiplication table memorized, and even that was from daily use, not from deliberate memorization. There wasn't a day in my life where I could recite the full table.

Maybe schools teach by memorization, but my mom taught me by explaining what it means, and I highly recommend this approach (and am a proof by example that humans can learn this way).

link

naasking 735 days ago

> I started by understanding. I could multiply by repeat addition

How did you learn what the symbols for numbers mean and how addition works? Did you literally just see "1 + 3 = 4" one day and intuit the meaning of all of those symbols? Was it entirely obvious to you from the get-go that "addition" was the same as counting using your fingers which was also the same as counting apples which was also the same as these little squiggles on paper?

There's no escaping the fact that there's memorization happening at some level because that's the only way to establish a common language.

link

SonOfLilit 734 days ago

There's a difference between memorizing meanings of words (addition is same as counting this and then the other thing, "3" means three things) and memorizing methods (table of single digit addition/multiplication to do them faster in your head). You were arguing the second, I'm a counterexample. I agree about the first, everyone learns language by memorization (some rote, some by use), but language is not math.

link

naasking 734 days ago

> You were arguing the second, I'm a counterexample.

I still don't think you are. Since we agree that you memorized numbers and how they are sequential, and that counting is moving "up" in the sequence, addition as counting is still memorizing a procedure based on this, not just memorizing a name: to add any two numbers, count down on one as you count up on the other until the first number number reaches zero, and the number that counted up is the sum. I'm curious how you think you learned addition without memorizing this procedure (or one equivalent to it).

Then you memorized the procedure for multiplication: given any two numbers, count down on one and add the other to itself until the counted down number reaches one. This is still a procedure that you memorized under the label "multiplication".

This is exactly the kind of procedure that I initially described. Someone taught you a correct procedure for achieving some goal and gave you a name for it, and "learning math" consists of memorizing such correct procedures (valid moves in the game of math if you will). These moves get progressively more sophisticated as the math gets more advanced, but it's the same basic process.

They "make sense" to you, and you call it "understanding", because they are built on a deep foundation that ultimately grounds out in counting, but it's still memorizing procedures up and down the stack. You're just memorizing the "minimum" needed to reproduce everything else, and compression is understanding [1].

The "variation in outcomes" that an OP discussed is simply because many valid moves are possible in any given situation, just like in chess, and if you "understand" when a move is valid vs. not (eg. you remember it), then you have an advantage over someone who just memorized specific shortcuts, which I suspect is what you are thinking I mean by memorization.

[1] https://philpapers.org/rec/WILUAC-2

link

dimask 732 days ago

I think you are confusing "memory" with strategies based on memorisation. Yes memorising (ie putting things into memory) is always involved in learning in some way, but that is too general and not what is discussed here. "Compression is understanding" possibly to some extent, but understanding is not just compression; that would be a reduction of what understanding really is, as it involves a certain range of processes and contexts in which the understanding is actually enacted rather than purely "memorised" or applied, and that is fundamentally relational. It is so relational that it can even go deeply down to how motor skills are acquired or spatial relationships understood. It is no surprise that tasks like mental rotation correlates well with mathematical skills.

Current research in early mathematical education now focuses on teaching certain spatial skills to very young kids rather than (just) numbers. Mathematics is about understanding of relationships, and that is not a detached kind of understanding that we can make into an algorithm, but deeply invested and relational between the "subject" and the "object" of understanding. Taking the subject and all the relations with the world out of the context of learning processes is absurd, because that is in the exact centre of them.

link

SonOfLilit 733 days ago

Sorry, I strongly disagree.

I did memorize names of numbers, but that is not essential in any way to doing or understanding math, and I can remember a time where I understood addition but did not fully understand how names of numbers work (I remember, when I was six, playing with a friend at counting up high, and we came up with some ridiculous names for high numbers because we didn't understand decimal very well yet).

Addition is a thing you do on matchsticks, or fingers, or eggs, or whatever objects you're thinking about. It's merging two groups and then counting the resulting group. This is how I learned addition works (plus the invariant that you will get the same result no matter what kind of object you happen to work with). Counting up and down is one method that I learned, but I learned it by understanding how and why it obviously works, which means I had the ability to generate variants - instead of 2+8=3+7=... I can do 8+2=9+1=..., or I can add ten at a time, etc'.

Same goes for multiplication. I remember the very simple conversation where I was taught multiplication. "Mom, what is multiplication?" "It's addition again and again, for example 4x3 is 3+3+3". That's it, from that point on I understood (integer) multiplication, and could e.g. wonder myself at why people claim that xy=yx and convince myself that it makes sense, and explore and learn faster ways to calculate it while understanding how they fit in the world and what they mean. (An exception is long multiplication, which I was taught as a method one day and was simple enough that I could memorize it and it was many years before I was comfortable enough with math that whenever I did it it was obvious to me why what I'm doing here calculates exactly multiplication. Long division is a more complex method: it was taught to me twice by my parents, twice again in the slightly harder polynomial variant by university textbooks, and yet I still don't have it memorized because I never bothered to figure out how it works nor to practice enough that I understand it).

I never in my life had an ability to add 2+2 while not understanding what + means. I did for half an hour have the same for long division (kinda... I did understand what division means, just not how the method accomplishes it) and then forgot. All the math I remember, I was taught in the correct order.

edit: a good test for whether I understood a method or just memorized it would be, if there's a step I'm not sure I remember correctly, whether I can tell which variation has to be the correct one. For example, in long multiplication, if I remembered each line has to be indented one place more to the right or left but wasn't sure which, since I understand it, I can easily tell that it has to be the left because this accomplishes the goal of multiplying it by 10, which we need to do because we had x0 and treated it as x.

link

11101010001100 735 days ago

The point is the memorization exercise requires orders of magnitude fewer examples for bootstrapping.

link

naasking 735 days ago

Does it though? It's a common claim but I don't think that's been rigourously established.

link