| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nullc 2104 days ago

With GPT3 you can give it a bit more long term memory by priming it with text that has "self commentary" written and repeated through each paragraph.

[This is a post to Hacker news and I'm making a point to explain a gimmick for giving GPT3 self generated longer term 'memory'.]

Most obvious forms of memory have a problem that they aren't differentiable so you can't train with them in place. This idea works around the issue because english text contains things like running commentary a times, and so a model trained on it already has some idea of how to use it.

[This is a post to Hacker news and I'm making a point to explain a gimmick for giving GPT3 self generated longer term 'memory' and the limitations of other approaches.]

I've had some success at getting this to help generate better text. I wonder though if it would be effective to generate a new training corpus this way. E.g. get GPT3 to generate annotations for arbitrary input text by using some summerization prompt, then use that to go augment the entire training corpus with the summaries injected inline like virtual-thought bubbles with beginning and ending symbols that don't occur in the training material. Then the network is retrained on this augmented data and then can generate its own prompts.

Bonus: the operator could be given access to the otherwise normally hidden "internal monolog" text, to increase control over the output or understand more about the state of the model.

You can't differentiate across the different executions, due to sampling-- but perhaps you don't need to... it doesn't do any gradient descent to perform one shot learning.

I am guessing that this must not work at scale because it's an obvious enough idea and a similar approach for database access (e.g. have it generate keywords from the text, then inject tokens encoding some text search results for those keywords the stream, and skip over them in training and just keep them as context; thus training a model that can use a search to improve its results) must have been tried but I've never heard anyone report it working.