| Well, that's awkward. I didn't realise you were on HN. I'm sorry for the
personal tone of my comment. You are right that it was uncalled for. The paper you linked is clear on the scope of its proofs and in any case it's a
very big assumption to say that "neural nets are Turing complete", when there
are scant few such proofs, compared with the large number of different
architectures (for most of which, no careful investigation of their
computational capabilities is ever done anyway). You could add a clarification to your article. >> Do you have an alternative viewpoint on what allows LLMs to be able to
somewhat accurately answer complicated math questions, despite lacking an
explicitly programmed math solver? Yes, it's because they're language models. In particular, they're very powerful,
very smooth (in the statistical sense) language models trained to represent
gigantic text corpora. Their ability to produce correct answers once in a while
is not a surprise and does not need any other explanation. Predicting what a language model (big or small) will output is another matter,
so one particular instance of generated output might be surprising in the sense
that the user won't expect it - not in the sense that the model shouldn't be
able to produce it. In any case, it's clear that the performance of those models depends on the
prompts. Change the prompt slightly and you get a different answer, to any
question. That suggests retrieval from memory (modulo stochasticity) much more
than it suggests computation. And we know that these models are not models of
computation, so there's no question what's really going on. When I say "retrieval from memory" I don't mean that these models memorise whole
sequences of tokens verbatim. To make a very big fudge about it, it's as if
they've memorised templates that they can then apply to questions to generate
the right answers. I guess that still sounds magickal and mysterious if one hasn't worked with
language models before, so all I can say is, if you are really curious, and
really want to understand the specifics, you should try to learn more about
language models. I suggest the following as a starting point: Eugene Charniak, Statistical Language Learning https://mitpress.mit.edu/9780262531412/statistical-language-... Dan Jurafsky and James H. Martin, Speech and Language Processing https://web.stanford.edu/~jurafsky/slp3/ Chris Manning and Hinrich Schűtze, Foundations of Statistical Natural Language
Processing https://nlp.stanford.edu/fsnlp/ Those are rather "wax-on, wax-off", but if you want to learn Karate, that's where
to begin. Then you can go on to beat up the Transformers and win the girl. The Charniak book in particular is small and sweet and easy to read. Start
there. |