| HN Mirror

Spot on, realtime cost visibility changes behavior more than any hard limit does. On CLI latency with concurrent users,each request spawns its own process so they don't block each other, but yeah, 3-8s per call isn't great at scale. Works fine for a small team doing internal demo/testing, which is all this is meant for. It becomes very noticeable for chat-type functionality though that kind of latency ruins the conversational flow. Anything needing real-time responses or throughput should just use the official APIs directly.