Hacker News new | ask | show | jobs
by novachen 99 days ago
We've been dealing with this exact problem building agent-driven workflows. A few things that have actually helped:

The unpredictability is worse than the absolute cost. Our billing model broke several times not because costs were high, but because we couldn't bound them. One approach that helped: define a 'token budget' per user action at design time - cap total tokens per session and treat hitting the cap as a first-class outcome your product handles gracefully, not an error.

On the forecasting side, we track cost per workflow step rather than per request. Step-level cost is much more stable than request-level because it absorbs the variance in tool calls and retries. Once you have step costs, you can forecast by expected workflow composition.

On fixed subscription pricing for AI APIs - I'd actually pay a premium for that. The unpredictability creates a hidden cost: you over-provision margins and add complexity to your pricing tier design. A flat rate for a capacity bucket would eliminate both.

The question I'd ask about any such service: how do they handle the tail cases where agents go off-rails and rack up 10x normal token usage? That's where the cost risk actually lives.