Y
Hacker News
new
|
ask
|
show
|
jobs
by
Jgrubb
7 days ago
The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.
2 comments
ajmurmann
7 days ago
It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.
link
knollimar
6 days ago
Don't you resend after every turn, so splitting it avoids the n^2 token usage (granted it's cached so there's some optimal amount here)
link
ajmurmann
6 days ago
Yes, exactly. You resend it on every turn (assuming no cache hits). This is why using the shorter-lived subagent to take in that context and only return the useful result back to the longer-lived context safes tokens.
link
ViewTrick1002
7 days ago
The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.
link