|
|
|
|
|
by ajstars
105 days ago
|
|
The compute cost debate misses a subtler point: the real cost multiplier isn't inference, it's context length. Most agent frameworks naively stuff 6-8k tokens into every prompt turn. If you route intelligently and compress memory hierarchically, you can bring that down to 200-400 tokens per turn with no quality loss. The model cost then becomes almost irrelevant. |
|