| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bentt 975 days ago
	Are there methods to "summarize what they've learned" and then replace the context window with the shorter version? This seems like pretty much what we do as humans anyway... we need to encode our experiences into stories to make any sense of them. A story is a compression and symbolization of the raw data one experiences.

3 comments

filterfiber 975 days ago

Yeah that's a fairly well studied one. Most of these techniques are rather "lossy" compared to extending the context window. The most likely "real solution" is going to be using various tricks and finetuning on higher context lengths to just extend the context window.

Here's a bunch of other related methods,

Summarizing context - https://arxiv.org/abs/2305.14239

continuous finetuning - https://arxiv.org/pdf/2307.02839.pdf

retrieval augmented generation - https://arxiv.org/abs/2005.11401

knowledge graphs - https://arxiv.org/abs/2306.08302

augmenting the network a side network - https://arxiv.org/abs/2306.07174

another long term memory technique - https://arxiv.org/abs/2307.02738

link

montyanderson 975 days ago

this is a fantastically useful comment. thank you filterfiber :)

link

benterix 975 days ago

Is there a realistic way to actually increase the context window?

link

filterfiber 975 days ago

Yes! The obvious answer is to just increase your positions and train for that. This requires a ton of memory however (context length is squared) so most are currently training at 4k/8k and then finetuning higher similar to many of the image models.

However there's been some work that to "get extra milage" out of the current models so-to speak with rotary positions and a few other tricks. These in combination with finetuning is the current method many are using at the moment IIRC.

Here's a decent overview https://aman.ai/primers/ai/context-length-extension/

Rope - https://arxiv.org/abs/2306.15595

Yarn (based on rope) - https://arxiv.org/pdf/2309.00071.pdf

LongLoRA - https://arxiv.org/pdf/2309.12307.pdf

The bottleneck is quickly going to be inference. Since the current transformer models need the context length ^2, the memory requirements go up very quickly. IIRC a 4090 can _barely_ fit a 4bit 30B model in memory with 4096k context length.

From my understanding some form of RNNs are likely to be the next step for longer context. See RWKV as an example of a decent RNN https://arxiv.org/abs/2305.13048

link

abstrct 975 days ago

I’ve absolutely explored this idea but, similar to lossy compression, sometimes important nuance is lost in the process. There is both an art and science to recalling the gently compacted information and being able to recognize when it needs to be repeated back.

link

adammichaelc 975 days ago

If there was something like Objects in OO programming, but for LLM’s, would that solve this?

Like a Topic-based Personality Construct where the model first determines which of its “selves” should answer the question, and then grabs appropriate context given the situation.

link

lukebuehler 975 days ago

look up "frames", it's an old concept and also influenced OOP.

link

nonameiguess 975 days ago

The animal brain equivalent isn't summarize a context window to account for limited working memory. It's to never leave training mode to go into inference-only mode. The learned models in animal brains never stop learning.

There is nothing stopping someone from keeping an LLM in online-training mode forever. We don't do that because it's economically infeasible, not because it wouldn't work.

link