Hacker News new | ask | show | jobs
by Joeri 1101 days ago
I’m not sure I agree, but I’ll do my best to argue the position.

The neural net software algorithms have been around for decades. What made LLM’s feasible are the hardware advances to achieve unprecedented scale, just barely providing the ability (at great cost) to train today’s LLM models. Transformer architecture might be called a software innovation, but RWKV Raven gets similar performance to transformers and is built on decades-old RNN technology. So it is the hardware that was far more instrumental than the software in achieving LLM’s.

Counter to that argument: had google not done neural net research for google translate and proved the transformer approach scaled and performed well in their “attention is all you need” paper, people wouldn’t have spent the money to train foundation LLM models and we would not be having this discussion, so the software really mattered more than the hardware.

In reality I think it’s a little bit of both.

3 comments

That's not necessarily true. The algorithm ( transformers) was done by Google in 2017.

Even while they missed the opportunity, Gpt-1 was released in 2018 by OpenAI.

Then they incrementally added parameters in the next versions.

Great take. It is really a mix of several factors, each one leveraging the other, and your arguments are great.
I don’t think I totally agree either but your argument for the position definitely has some merit.