|
|
|
|
|
by codelion
473 days ago
|
|
that's interesting... i've been noticing similar issues with long context windows & forgetting. are you seeing that the model drifts more towards the beginning of the context or is it seemingly random? i've also been experimenting with different chunking strategies to see if that helps maintain coherence over larger contexts. it's a tricky problem. |
|
Reasoning models probably need some optimization constraint put on the length of the CoT, and also some priority constraint (only reason about things that need it).