|
|
|
|
|
by null_investor
637 days ago
|
|
This is not accurate. OpenAI and other companies could do it not entirely because of transformers but because of the hardware that can compute faster. We've had upgrades to hardware, mostly led by NVidia, that made it possible. New LLMs don't even rely that much on that aforementioned older architecture, right now it's mostly about compute and the quality of data. I remember seeing some graphs that shows that the whole "learning" phenomena that we see with neural nets is mostly about compute and quality of data, the model and optimizations just being the cherry on the cake. |
|
Don’t they all indicate being based on the transformer architecture?
> not entirely because of transformers but because of the hardware
Kaplan et al. 2020[0] (figure 7, §3.2.1) shows that LSTMs, the leading language architecture prior to transformers, scaled worse because they plateau’ed quickly with larger context.
[0]: https://arxiv.org/abs/2001.08361