Hacker News new | ask | show | jobs
by Jgrubb 7 days ago
The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.
2 comments

It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.
Don't you resend after every turn, so splitting it avoids the n^2 token usage (granted it's cached so there's some optimal amount here)
Yes, exactly. You resend it on every turn (assuming no cache hits). This is why using the shorter-lived subagent to take in that context and only return the useful result back to the longer-lived context safes tokens.
The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.