Hacker News new | ask | show | jobs
by lettergram 844 days ago
lol I think in general, LLM research traces its origins back to all the standard deep learning techniques: NNs, CNNs, LSTMs, RNNs, etc.

In 2018, with the release of transformers (via google) it enabled much more rapid training of models and more generalization with less data. 100% of the LLMs (as you’d probably thing of them)trace their origins to BERT.

That said, my team was working with hundred million to low billions of parameter LSTMs & CNNs back in 2016-2017 that were comparable to some lighter weight LLMs today.

In my opinion, the greatest strides in the space has less to do with the underlying architecture, and more to do with improved data formatting, accessibility and compute improvements.