|
|
|
|
|
by Joeri
1101 days ago
|
|
I’m not sure I agree, but I’ll do my best to argue the position. The neural net software algorithms have been around for decades. What made LLM’s feasible are the hardware advances to achieve unprecedented scale, just barely providing the ability (at great cost) to train today’s LLM models. Transformer architecture might be called a software innovation, but RWKV Raven gets similar performance to transformers and is built on decades-old RNN technology. So it is the hardware that was far more instrumental than the software in achieving LLM’s. Counter to that argument: had google not done neural net research for google translate and proved the transformer approach scaled and performed well in their “attention is all you need” paper, people wouldn’t have spent the money to train foundation LLM models and we would not be having this discussion, so the software really mattered more than the hardware. In reality I think it’s a little bit of both. |
|
Even while they missed the opportunity, Gpt-1 was released in 2018 by OpenAI.
Then they incrementally added parameters in the next versions.