| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rahen 65 days ago

That's an idea I had a few months ago: after going through a compaction once the KV cache is nearing capacity, accumulate this knowledge into a dataset to fine-tune a LoRA during offline hours.

This would create a three-layer memory system:

- Stable long-term memory (initial base weights)

- Mid-term memory built from the compactions and replay buffers

- Short-term memory (KV cache)

Sleeping would just be a fancy term for consolidating and transferring information from one memory layer to another during offline hours. Maybe that's also what the brain does while sleeping.

2 comments

chermi 65 days ago

Wouldn't that just accelerate collapse? How much do you trust the outputs of the llm to provide trustworthy and valuable new information? I mean I understand distillation works. But that's much more structured and thoughtful than my sessions at least.

link

jack_pp 65 days ago

We can trust the feedback we give it based on the output it provides.

link

ambicapter 65 days ago

What kind of feedback are you giving? What's the reward function?

link

jack_pp 65 days ago

Right now, no feedback since I don't run this system but our workflows could change to accommodate it

link

rahen 65 days ago

I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift.

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.

link

DonHopkins 65 days ago

It's a network of computers with GPUs, so there's no reason it can't sleep at the same time it's awake. Just a continuous "sleeping" process going on in the background, incrementally updating the model. No need for the "thinking" process to be "unconscious" while the "sleeping" process runs. Anthropomorphism confuses everything. There's no such thing as "offline hours" because the Earth is a sphere and the United States is not the center of the universe.

link

fc417fc802 64 days ago

> the Earth is a sphere and the United States is not the center of the universe.

Felt like stating the obvious there? Greenwich being the center of everything after all.

link