Hacker News new | ask | show | jobs
by raviparikh 1201 days ago
Thanks for taking the time to read the article and comment. Appreciate your feedback. As you point out, my last couple paragraphs were somewhat speculative and handwav-y. Do you have an alternative viewpoint on what allows LLMs to be able to somewhat accurately answer complicated math questions, despite lacking an explicitly programmed math solver? It sounds like you may be better informed than me–would love to hear your thoughts.

> that the author clearly didn't read. I guess there's too many scary maths for a "layman".

No need for the personal attack. I did read the paper and the math in the paper is not particularly complicated.

1 comments

Well, that's awkward. I didn't realise you were on HN. I'm sorry for the personal tone of my comment. You are right that it was uncalled for.

The paper you linked is clear on the scope of its proofs and in any case it's a very big assumption to say that "neural nets are Turing complete", when there are scant few such proofs, compared with the large number of different architectures (for most of which, no careful investigation of their computational capabilities is ever done anyway).

You could add a clarification to your article.

>> Do you have an alternative viewpoint on what allows LLMs to be able to somewhat accurately answer complicated math questions, despite lacking an explicitly programmed math solver?

Yes, it's because they're language models. In particular, they're very powerful, very smooth (in the statistical sense) language models trained to represent gigantic text corpora. Their ability to produce correct answers once in a while is not a surprise and does not need any other explanation.

Predicting what a language model (big or small) will output is another matter, so one particular instance of generated output might be surprising in the sense that the user won't expect it - not in the sense that the model shouldn't be able to produce it.

In any case, it's clear that the performance of those models depends on the prompts. Change the prompt slightly and you get a different answer, to any question. That suggests retrieval from memory (modulo stochasticity) much more than it suggests computation. And we know that these models are not models of computation, so there's no question what's really going on.

When I say "retrieval from memory" I don't mean that these models memorise whole sequences of tokens verbatim. To make a very big fudge about it, it's as if they've memorised templates that they can then apply to questions to generate the right answers.

I guess that still sounds magickal and mysterious if one hasn't worked with language models before, so all I can say is, if you are really curious, and really want to understand the specifics, you should try to learn more about language models.

I suggest the following as a starting point:

Eugene Charniak, Statistical Language Learning

https://mitpress.mit.edu/9780262531412/statistical-language-...

Dan Jurafsky and James H. Martin, Speech and Language Processing

https://web.stanford.edu/~jurafsky/slp3/

Chris Manning and Hinrich Schűtze, Foundations of Statistical Natural Language Processing

https://nlp.stanford.edu/fsnlp/

Those are rather "wax-on, wax-off", but if you want to learn Karate, that's where to begin. Then you can go on to beat up the Transformers and win the girl.

The Charniak book in particular is small and sweet and easy to read. Start there.