|
|
|
|
|
by rox_kd
79 days ago
|
|
In what settings do you mean - there are multiple strategies, I think building your own compaction layer in front seems a bit over-kill ? have you considered implementing some cache strategy, otherwise summary pipelines - I made once an agent which based on the messages routed things to a smaller model for compaction / summaries to bring down the context, for the main agent. But also ensuring you start new fresh context threads, instead of banging through a single one untill your whole feature is done .. working in small atomic incrementals works pretty good |
|
But my issue wasn’t just inefficiency, it was agents retrying when they shouldn’t.
I needed visibility + limits per agent/task, and the ability to cut it off, not just optimize it.