Hacker News new | ask | show | jobs
by rahen 18 days ago
That's an idea I had a few months ago: after going through a compaction once the KV cache is nearing capacity, accumulate this knowledge into a dataset to fine-tune a LoRA during offline hours.

This would create a three-layer memory system:

- Stable long-term memory (initial base weights)

- Mid-term memory built from the compactions and replay buffers

- Short-term memory (KV cache)

Sleeping would just be a fancy term for consolidating and transferring information from one memory layer to another during offline hours. Maybe that's also what the brain does while sleeping.

2 comments

Wouldn't that just accelerate collapse? How much do you trust the outputs of the llm to provide trustworthy and valuable new information? I mean I understand distillation works. But that's much more structured and thoughtful than my sessions at least.
We can trust the feedback we give it based on the output it provides.
What kind of feedback are you giving? What's the reward function?
Right now, no feedback since I don't run this system but our workflows could change to accommodate it
I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift.

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.

It's a network of computers with GPUs, so there's no reason it can't sleep at the same time it's awake. Just a continuous "sleeping" process going on in the background, incrementally updating the model. No need for the "thinking" process to be "unconscious" while the "sleeping" process runs. Anthropomorphism confuses everything. There's no such thing as "offline hours" because the Earth is a sphere and the United States is not the center of the universe.
> the Earth is a sphere and the United States is not the center of the universe.

Felt like stating the obvious there? Greenwich being the center of everything after all.