Hacker News new | ask | show | jobs
by AlexCalderAI 104 days ago
Nice approach to the per-key cost cap problem. We built something similar for tracking AI spend across providers - the "one forgotten loop" scenario is real and expensive.

The JSON-on-disk pattern works surprisingly well for this scale. We found the key insight is making costs visible in real-time rather than waiting for end-of-month bills. Even just seeing token counts per request changes behavior.

Curious if you've hit the CLI latency wall yet with concurrent users - that 3-8s overhead compounds fast with a team.

1 comments

Spot on, realtime cost visibility changes behavior more than any hard limit does. On CLI latency with concurrent users,each request spawns its own process so they don't block each other, but yeah, 3-8s per call isn't great at scale. Works fine for a small team doing internal demo/testing, which is all this is meant for. It becomes very noticeable for chat-type functionality though that kind of latency ruins the conversational flow. Anything needing real-time responses or throughput should just use the official APIs directly.