Hacker News new | ask | show | jobs
by digdugdirk 844 days ago
Yes, that's technically accurate. But I prefer to think of the entire LLM space as a new scientific field that started when OpenAI released ChatGPT.

In that context, all new research directions are valuable simply for the fact that they're expanding the foundation of the field. 5 years from now, who knows what the most effective models will use under the hood, but the more we can learn about them in general, the better.

2 comments

lol I think in general, LLM research traces its origins back to all the standard deep learning techniques: NNs, CNNs, LSTMs, RNNs, etc.

In 2018, with the release of transformers (via google) it enabled much more rapid training of models and more generalization with less data. 100% of the LLMs (as you’d probably thing of them)trace their origins to BERT.

That said, my team was working with hundred million to low billions of parameter LSTMs & CNNs back in 2016-2017 that were comparable to some lighter weight LLMs today.

In my opinion, the greatest strides in the space has less to do with the underlying architecture, and more to do with improved data formatting, accessibility and compute improvements.

The field of research here is far older than ChatGPT's release. Neural network research has been going on for at least 50 years.

Most of the research that enabled ChatGPT was also already known. "Attention is all you need" was a 2017 paper.

It still is a fast evolving field, but not one that just kicked off.