|
|
|
|
|
by xg15
1213 days ago
|
|
> The thing doesn't even have a persistent thought from one token to the next - every output is a fresh prediction using only the text before it. Using all the tokens before it. I think too many people are believing that "word prediction model" implies "markov chain from the 90s" and are calming themselves with some false sense of security from that impression. "It just predicts the next token based on the previous tokens" doesn't really tell us a lot, because it leaves completely open how it does the prediction - and that algorithm can be arbitrarily complex. > It can't even plan two tokens ahead. No, but it can look two tokens back. E.g., you could imagine an algorithm that formulates a longer response in memory, then only returns the first token from it and "forgets" the rest - and repeats this for each token. That would allow the model to "think ahead" and still match the "API" of only predicting the next token with the only persistent state being the output. |
|