|
|
|
|
|
by trott
618 days ago
|
|
My feeling is that the answer is "no", in the sense that these RNNs wouldn't be able to universally replace Transformers in LLMs, even though they might be good enough in some cases and beat them in others. Here's why. A user of an LLM might give the model some long text and then say "Translate this into German please". A Transformer can look back at its whole history. But what is an RNN to do? While the length of its context is unlimited, the amount of information the model retains about it is bounded by whatever is in its hidden state at any given time. Relevant: https://arxiv.org/abs/2402.01032 |
|
This is no different than a transformer, which, after all, is bound by a finite state, just organized in a different manner.