Hacker News new | ask | show | jobs
by famouswaffles 973 days ago
>Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

Lol researchers were surprised by the mostly incoherent nonsense pre-transformer RNNs were spouting years go, nevermind the near perfect coherency of later GPT models. To argue otherwise is just plain revisionism.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

1 comments

From your linked post:

> What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me.

This reads more like humanizing the language of the post then any legitimate surprise from the author.

The rest of the post then goes into great detail showing that “we DO really know what happened” to paraphrase the definition the op provides for their use of “surprise”.

> Conclusion We’ve learned about RNNs, how they work, why they have become a big deal, we’ve trained an RNN character-level language model on several fun datasets, and we’ve seen where RNNs are going.

I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

> In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights).

Mathematically proven to be able to do something is about as far from surprise as one can get.

>This reads more like humanizing the language of the post then any legitimate surprise from the author.

Lol Sure

>I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

We don't know what the models learn and what they employ to aid in predictions. That is fact. Going on a grad descent rant is funny but ultimately meaninglessness. It doesn't tell you anything about the meaning of the computations.

There is no misplaced incomprehensibility because the internals and how they meaningfully shape predictions is incomprehensible.

>Mathematically proven to be able to do something is about as far from surprise as one can get.

Magic the gathering is turing complete. I'm sorry but "therotically turing complete" is about as meaningless as it gets. Transformers aren't even turing complete.