|
|
|
|
|
by pgspaintbrush
1097 days ago
|
|
Author here. First off, thank you for reading and for your thoughts.
I provided examples that I thought would be intuitive for humans to help folks understand that an understanding of the underlying phenomena is useful for next token prediction (I've added this as a note).
Could you share what part of the article came across as suggesting that LLMs "magically" acquire whatever ability helps them to predict? I'd like to make that section clearer, so that doesn't come across. Re: "LLMs are not particularly good at arithmetic". There are published results that show that LLMs using certain techniques reach close to 100% accuracy on 8-digit addition: https://arxiv.org/pdf/2206.07682.pdf.
There are also recent results from OpenAI where their model obtained solid results on high school math competition problems, which are harder than arithmetic: https://openai.com/research/improving-mathematical-reasoning...
I haven't looked into counting syllables or recognizing haikus but I bet that this is a result of tokenization and not an inability of the model to create a representation of the underlying phenomena. |
|
I'm not an expert in the field, but, there are lots of previous algorithms for predicting the next token in a series (Markov chains, autocomplete). None of them felt so much pressure to make an accurate prediction that they had no alternative but to teach themselves arithmetic! It seems what is different about LLMs (as far as the post goes) is that we can anthropomorphize them.
More seriously, I guess I just feel like a meaningful sketch of an explanation for why algorithm X (where X is LLMs in this case) for continuing a piece of text is good at problem A should involve something about X and A. Because it is clearly highly dependent on the exact values of X and A, not just whether A can be posed as a text completion problem and humans would prefer the computer learn to solve the underlying problem to produce better text. For example, it could help to imagine a mechanism by which algorithm X could solve problem A. The closest thing to a mechanism (something algorithm X, i.e. LLMs, might be doing that's special) in the post is the talk of necessity being the mother of invention and "a deeper understanding of reality simplifies next-token prediction tasks," and the suggestion that if you were an LLM you might want to use "the rules of addition."
It's true that modeling arithmetic in some way could help a LLM account for known arithmetic problems in the training data, which could help it on unseen arithmetic problems, but what problems an LLM can solve is a function of what it can model. Anything an LLM can't model or can't do, it just doesn't. LLMs are really bad at chess, for example. The patterns of digits in addition may be similar enough to the hierarchical patterns in language the LLM is modeling. But it's not clear if the LLM is using the "rules of addition" or not. As far as I know, we don't actually understand why LLMs are able to store so much factual information, produce such coherent stories, and do the specific things they can do.