|
|
|
|
|
by ssgh
1276 days ago
|
|
The pantsers can also fit the model I've described. In this case GPT would keep in memory a sliding window of past N=1024 words, like it does today, but in addition to that it would remember the past N paragraph-tokens (symbols that are blurry versions of all the words in that paragraph), the past N chapter-tokens and so on. When generating words, GPT would first generate the next chapter-token, then the next paragraph-token and finally the next word-token. |
|