Hacker News new | ask | show | jobs
by YeGoblynQueenne 1208 days ago
>> But this program is representable by a neural net; after all, neural nets are turing complete. [1]

This is indeed evidence of an interesting phenomenon. It seems that many of the hare-brained things that people say lately are conclusions they have drawned starting from the premise that neural nets are somehow magickal and mysterious, and so they can do anything and everything anyone could imagine, and we don't even really need to come up with any other explanation about those wonders, than "it's a neural net!".

So, for example, the author can claim that "there’s some sort of fuzzy arithmetic engine at the heart of GPT-3", without having to explain what, exactly, is a "fuzzy arithmetic engine" (it's just "some sort" of thing, who cares?) and why we need such a device to explain the behaviour of a language model.

Then again, what's the point? People write stuff on the internets. Now we have language models trained on that nonsense. Things can only get worse.

_______________

[1] The link in the article points to a paper on the computational capabilities of Recurrent Neural Nets (RNNs), not "neural nets" in general. The Transformer architecture, used to train GPT-3's model is not an RNN architecture. In any case, the linked paper, and papers like it, only show that one can simulate any Turing machine by a specially constructed net. To learn a neural net that simulates any Turing machine (i.e. without hand-crafting) one would have to train it on Turing machines; and probably all Turing machines. GPT-3's model, besides not being an RNN, was trained on text, not Turing machines, so there's a few layers of strong assumptions needed before one can claim that it somehow, magickally, turned into a model of a Turing machine.

Anyway, the Turing-complete networks discussed in the linked paper, and similar work, inherit the undecidability of Universal Turing Machines and so it is impossible to predict the value of any activation function at any point in time. Which means that, if a neural net ever really went Turing complete, we wouldn't be able to tell whether its training has converged, or if it ever will. So that's an interesting paper- that the author clearly didn't read. I guess there's too many scary maths for a "layman". Claiming that GPT-3 has "some sort of fuzzy arithmetic engine" doesn't need any maths.

2 comments

Thanks for taking the time to read the article and comment. Appreciate your feedback. As you point out, my last couple paragraphs were somewhat speculative and handwav-y. Do you have an alternative viewpoint on what allows LLMs to be able to somewhat accurately answer complicated math questions, despite lacking an explicitly programmed math solver? It sounds like you may be better informed than me–would love to hear your thoughts.

> that the author clearly didn't read. I guess there's too many scary maths for a "layman".

No need for the personal attack. I did read the paper and the math in the paper is not particularly complicated.

Well, that's awkward. I didn't realise you were on HN. I'm sorry for the personal tone of my comment. You are right that it was uncalled for.

The paper you linked is clear on the scope of its proofs and in any case it's a very big assumption to say that "neural nets are Turing complete", when there are scant few such proofs, compared with the large number of different architectures (for most of which, no careful investigation of their computational capabilities is ever done anyway).

You could add a clarification to your article.

>> Do you have an alternative viewpoint on what allows LLMs to be able to somewhat accurately answer complicated math questions, despite lacking an explicitly programmed math solver?

Yes, it's because they're language models. In particular, they're very powerful, very smooth (in the statistical sense) language models trained to represent gigantic text corpora. Their ability to produce correct answers once in a while is not a surprise and does not need any other explanation.

Predicting what a language model (big or small) will output is another matter, so one particular instance of generated output might be surprising in the sense that the user won't expect it - not in the sense that the model shouldn't be able to produce it.

In any case, it's clear that the performance of those models depends on the prompts. Change the prompt slightly and you get a different answer, to any question. That suggests retrieval from memory (modulo stochasticity) much more than it suggests computation. And we know that these models are not models of computation, so there's no question what's really going on.

When I say "retrieval from memory" I don't mean that these models memorise whole sequences of tokens verbatim. To make a very big fudge about it, it's as if they've memorised templates that they can then apply to questions to generate the right answers.

I guess that still sounds magickal and mysterious if one hasn't worked with language models before, so all I can say is, if you are really curious, and really want to understand the specifics, you should try to learn more about language models.

I suggest the following as a starting point:

Eugene Charniak, Statistical Language Learning

https://mitpress.mit.edu/9780262531412/statistical-language-...

Dan Jurafsky and James H. Martin, Speech and Language Processing

https://web.stanford.edu/~jurafsky/slp3/

Chris Manning and Hinrich Schűtze, Foundations of Statistical Natural Language Processing

https://nlp.stanford.edu/fsnlp/

Those are rather "wax-on, wax-off", but if you want to learn Karate, that's where to begin. Then you can go on to beat up the Transformers and win the girl.

The Charniak book in particular is small and sweet and easy to read. Start there.

On the one hand, the GPT3 model is not Turing complete because run-time of each invocation is linear in the input size (number of tokens). There is no input that will generate an infinite output, for example.

On the other hand, if you run each invocation on the output of the previous invocation, it seems plausible that you could give it a prompt with a description of a Turing machine and have it simulate indefinitely. In this way of looking at it, GPT3 only encodes a transition table of a Turing machine. we only have to believe that one can code arbitrary transition tables inside GPT3, be it through an initial prompt or through manual adjustment of the internal weights.

It’s pretty obvious that it can not correctly represent arbitrary transition tables: Just construct a Turing Machine that has a larger transition table than can be encoded in the language model.

On the other hand this is just a theoretical argument. Every existing computer is also not a Turing Machine as it has finite memory.