| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by password4321 4 days ago
	This is interesting to me because reducing context & token usage is in the user's best interest but not in the financial interest of AI vendors. I am not an expert but it sounds like your "one simple trick" would fix context issues and allow much tighter control over token usage. Thanks for being willing to share this tip in an HN comment, changing how those in the know use AI agents going forward -- it's hard to keep up!

2 comments

Jgrubb 4 days ago

The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.

link

ajmurmann 4 days ago

It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.

link

knollimar 3 days ago

Don't you resend after every turn, so splitting it avoids the n^2 token usage (granted it's cached so there's some optimal amount here)

link

ajmurmann 3 days ago

Yes, exactly. You resend it on every turn (assuming no cache hits). This is why using the shorter-lived subagent to take in that context and only return the useful result back to the longer-lived context safes tokens.

link

ViewTrick1002 4 days ago

The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.

link

loeg 4 days ago

> This is interesting to me because reducing context & token usage is in the user's best interest but not in the financial interest of AI vendors.

AI vendors still need to compete with each other both in terms of token cost and competency. An agent that is costly and less effective by wasting tokens is less competitive.

link