|
|
|
|
|
by rahen
26 days ago
|
|
I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift. Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA. That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead. |
|