| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by espadrine 633 days ago

> New LLMs don't even rely that much on that aforementioned older architecture

Don’t they all indicate being based on the transformer architecture?

> not entirely because of transformers but because of the hardware

Kaplan et al. 2020[0] (figure 7, §3.2.1) shows that LSTMs, the leading language architecture prior to transformers, scaled worse because they plateau’ed quickly with larger context.

[0]: https://arxiv.org/abs/2001.08361