Hacker News new | ask | show | jobs
by HarHarVeryFunny 1211 days ago
The technology has been a while coming .. language models have long been a research area within machine learning, with recurrent models such as RNNs and LSTMs being an earlier approach since they allow the model to process a (language) sequence of arbitrary length.

Problems/limitations of recurrent models led to other approaches being tried using "attention" as way to let earlier parts of a sequence impact future prediction, culminating in the 2017 "Attention is all you need" paper which introduced the "Transformer" architecture that all these current LLMs are based on.

From there it was a matter of scale - scaling up the model and amount of data the models were trained on. Nobody knew how well this "Transformer" architecture could perform at scale, but early signs were promising enough to keep pushing to see how much better they could get. OpenAI in particular have been very aggressive in pushing this scaling up with their GPT-N (N=1/2/3..) models. They themselves expressed some surprise at the capabilities of GPT-2, leading to the much larger GPT-3 that is the basis of ChatGPT.

Both OpenAI and others had been leery of publically releasing these very capable LLMs for fear of ways they might be misused, but finally OpenAI released GPT-3 (with a bit of human feedback polish) in the guise of the chat bot ChatGPT, which was the first time the public had seen what the tech was capable of.

The sudden impact of ChatGPT belies the incremental improvements that brought us to this point, but seems to have been largely because the public had never seen/experienced the steps that got us here, partly because of the highly accessible packaging of the tech as a web-based chat bot, and perhaps partly because it was released without much explanation from OpenAI as to what it was/how it works - they seem quite happy for the public to do what they've done and anthromorphise it as being an AI assistant.