|
|
|
|
|
by espadrine
633 days ago
|
|
> New LLMs don't even rely that much on that aforementioned older architecture Don’t they all indicate being based on the transformer architecture? > not entirely because of transformers but because of the hardware Kaplan et al. 2020[0] (figure 7, §3.2.1) shows that LSTMs, the leading language architecture prior to transformers, scaled worse because they plateau’ed quickly with larger context. [0]: https://arxiv.org/abs/2001.08361 |
|