Hacker News new | ask | show | jobs
by hakuseki 1275 days ago
I think that's not a bad summary, though? Perhaps you would say it is a probabilistic next-token chooser, but that just seems like a very minor distinction.
1 comments

Probabilistic and bayesian are not identical things. Moreover, GPT the deep-learning model is not a probabilistic next-token chooser. You can envision many different ways to choose the next word based on GPT output. OpenAI's API for GPT is a probabilistic word chooser paired along with GPT. But GPT is the model. It generates a set of probability distributions for the next word, not using a Bayesian process but something entirely different. GPT takes a vector space representation of a sentence and projects it onto some space (we'll call it GPTThink) and then re-projects that space to a new vector space. Then it uses softmax to turn that vector space into a probability distribution. That's not a Bayesian process.
Better! The last sentence still sounds like "magic," but this is getting closer to my mental comprehension of how you get from BASIC and Python to GPT.