Hacker News new | ask | show | jobs
by null_investor 637 days ago
This is not accurate. OpenAI and other companies could do it not entirely because of transformers but because of the hardware that can compute faster.

We've had upgrades to hardware, mostly led by NVidia, that made it possible.

New LLMs don't even rely that much on that aforementioned older architecture, right now it's mostly about compute and the quality of data.

I remember seeing some graphs that shows that the whole "learning" phenomena that we see with neural nets is mostly about compute and quality of data, the model and optimizations just being the cherry on the cake.

2 comments

> New LLMs don't even rely that much on that aforementioned older architecture

Don’t they all indicate being based on the transformer architecture?

> not entirely because of transformers but because of the hardware

Kaplan et al. 2020[0] (figure 7, §3.2.1) shows that LSTMs, the leading language architecture prior to transformers, scaled worse because they plateau’ed quickly with larger context.

[0]: https://arxiv.org/abs/2001.08361

Also, this sort of thing couldn't be done in the 80s or 90s, because it was much harder to compile that much data.