|
|
|
|
|
by just_boost_it
948 days ago
|
|
For each token, the model is run again from scratch on the sentence too, so any memory lasts just long enough to generate (a little less than) a word. The next word is generated by a model with a slightly different state because the last word is now in the past. |
|
Who’s to say the the me of yesterday _is_ the same as the me of today? I don’t even remember what that guy had for breakfast. I’m in a very different state today. My training data has been updated too.