| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ssgh 1324 days ago
	The pantsers can also fit the model I've described. In this case GPT would keep in memory a sliding window of past N=1024 words, like it does today, but in addition to that it would remember the past N paragraph-tokens (symbols that are blurry versions of all the words in that paragraph), the past N chapter-tokens and so on. When generating words, GPT would first generate the next chapter-token, then the next paragraph-token and finally the next word-token.