|
|
|
|
|
by soletta
119 days ago
|
|
I was a bit dubious until I read the gist. I've used a similar technique before to 'tame' GPT-3.5 and keep it following instructions and it worked well (though I had to ask the model to essentially repeat instructions after every 10 or so turns). I'm surprised you see that much drift though; older models were pretty bad with long contexts but I feel like the problem has mostly gone away with Claude Opus 4.6. |
|
On drift being "mostly gone" — depends on prompt complexity. With a simple system prompt, sure, modern models hold up fine. But with a large instruction set (mine is ~4000 tokens, 25+ rules across 7 sections) the drift is very much still there, even on Opus. The more rules you have, the more they compete for attention, and the easier it is for specific ones to drop off mid-session.
Also worth noting — this isn't limited to coding agents. Any long-running LLM workflow with complex instructions has the same problem. Customer support bots that forget their tone policy, medical assistants that stop citing sources, content moderation that gets lenient over time. If you have a system prompt with rules and a session longer than 20 minutes — the rules will decay.