Hacker News new | ask | show | jobs
by ajstars 105 days ago
The compute cost debate misses a subtler point: the real cost multiplier isn't inference, it's context length. Most agent frameworks naively stuff 6-8k tokens into every prompt turn. If you route intelligently and compress memory hierarchically, you can bring that down to 200-400 tokens per turn with no quality loss. The model cost then becomes almost irrelevant.