| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Jgrubb 7 days ago
	The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.

2 comments

ajmurmann 7 days ago

It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.

link

knollimar 6 days ago

Don't you resend after every turn, so splitting it avoids the n^2 token usage (granted it's cached so there's some optimal amount here)

link

ajmurmann 6 days ago

Yes, exactly. You resend it on every turn (assuming no cache hits). This is why using the shorter-lived subagent to take in that context and only return the useful result back to the longer-lived context safes tokens.

link

ViewTrick1002 7 days ago

The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.

link