| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bondarchuk 347 days ago
	I think what you are missing is the concept of generalization. It is obviously not possible to literally recall the entire training dataset, since the model itself is much smaller than the data. So instead of memorizing all answers to all questions in the training data, which would take up too much space, the predictor learns a more general algorithm that it can execute to answer many different questions of a certain type. This takes up much less space, but still allows it to predict the answers to the questions of that type in the training data with reasonable accuracy. As you can see it's still a predictor, only under the hood it does something more complex than matching questions to definitions. Now the thing is that if it's done right, the algorithm it has learned will generalize even to questions that are not in the training data. But it's nevertheless still a next-token-predictor.