| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ajstars 105 days ago
	The compute cost debate misses a subtler point: the real cost multiplier isn't inference, it's context length. Most agent frameworks naively stuff 6-8k tokens into every prompt turn. If you route intelligently and compress memory hierarchically, you can bring that down to 200-400 tokens per turn with no quality loss. The model cost then becomes almost irrelevant.