| I built this after reading too many incident reports of agent loops spending
$200 in 4 minutes because a quality threshold was never met. The pattern is always the same: an agent retries, fans out, or loops. Each
iteration passes individual rate-limit checks. Observability fires an alert
after the money is gone. Provider caps are per-provider, not cross-provider.
None of these stop the spend before it happens. RunCycles takes a different approach: reserve budget before the call, commit
actual spend after, release the remainder if the work is cancelled. The
reservation is atomic across all affected budget scopes — tenant, workspace,
agent — using Redis Lua scripts so concurrent agents sharing a budget can't
collectively overrun it. The integration surface is small: @cycles(estimate=50_000, action_kind="llm.completion", action_name="gpt-4o")
def call_llm(prompt: str) -> str:
return openai.complete(prompt)
When budget is exhausted, the next reservation attempt gets a 409
BUDGET_EXCEEDED before the downstream call is made.The architecture is three pieces: - Cycles Protocol: an open OpenAPI spec defining the reservation lifecycle,
idempotency semantics, scope hierarchy, and overage policies.
- RunCycles Server: Spring Boot + Redis, implements the spec. Runs in Docker.
- Clients: Python, TypeScript, Java/Spring Boot. The hardest part was idempotency under retries — if a commit fails transiently
and retries with the same key, it should get the original response back, not
double-charge. The Lua scripts handle this atomically. What it's not: a billing system, observability dashboard, or agent framework.
It's the layer that decides whether an action may proceed before it proceeds. Org: https://github.com/runcycles
Docs: https://runcycles.github.io/docs |