Hacker News new | ask | show | jobs
by codelion 473 days ago
that's interesting... i've been noticing similar issues with long context windows & forgetting. are you seeing that the model drifts more towards the beginning of the context or is it seemingly random?

i've also been experimenting with different chunking strategies to see if that helps maintain coherence over larger contexts. it's a tricky problem.

1 comments

Neither lost-in-the-middle nor long context performance have seen a lot of improvement in the recent year. It's not easy to generate long training examples that also stay meaningful, and all existing models still become significantly dumber after 20-30k tokens, particularly on hard tasks.

Reasoning models probably need some optimization constraint put on the length of the CoT, and also some priority constraint (only reason about things that need it).