|
|
|
|
|
by famouswaffles
973 days ago
|
|
>Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers? Lol researchers were surprised by the mostly incoherent nonsense pre-transformer RNNs were spouting years go, nevermind the near perfect coherency of later GPT models. To argue otherwise is just plain revisionism. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ |
|
> What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me.
This reads more like humanizing the language of the post then any legitimate surprise from the author.
The rest of the post then goes into great detail showing that “we DO really know what happened” to paraphrase the definition the op provides for their use of “surprise”.
> Conclusion We’ve learned about RNNs, how they work, why they have become a big deal, we’ve trained an RNN character-level language model on several fun datasets, and we’ve seen where RNNs are going.
I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.
> In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights).
Mathematically proven to be able to do something is about as far from surprise as one can get.