| Hey HN, I built this because I wanted to give my team access to Claude and GPT models for internal testing, but the official APIs have no per-key spending controls. You can't cap a key at $5/day or 100 requests/month — it's all or nothing. With non-technical team members in the mix (designers, PMs, QA), one forgotten loop or oversized prompt away from an ugly bill wasn't a risk I wanted to manage manually. Idea was to allow the members to test with these restricted API keys before using official keys. So I built a bridge: it wraps the Claude Code CLI and Codex CLI behind an Express API, backed by existing Max/Pro subscriptions instead of per-token billing. Each team member gets their own API key with hard limits — requests/day, tokens/month, cost caps. Hit the limit and the key stops working. No surprises. An admin dashboard shows who's using what in real time. Key features:
- Two providers: /generate (Claude) and /generate-codex (Codex)
- Per-user API keys with SHA-256 hashing (shown once, never stored raw)
- Per-key hard limits with real-time tracking and enforcement
- Admin dashboard for key management, usage monitoring, and request logs
- Deploy on a $5 VPS behind Cloudflare Tunnel What it's NOT: A production API replacement. It's for internal tooling and prototyping. CLI invocations add ~3-8s latency vs direct API calls. Important: Wrapping CLI subscriptions behind a shared API may violate the Terms of Service of the underlying providers. Anthropic's Consumer ToS (updated Feb 2026) prohibits using subscription OAuth tokens in third-party tools, and OpenAI's ToS prohibits account sharing. Review the applicable terms before using this. See the Disclaimer section in the README for details. Security was a focus: execFile (no shell injection), timing-safe auth, CSP/HSTS, input validation, rate limiting. Details in SECURITY.md. Stack: Node.js, TypeScript, Express. No database — JSON files on disk. GitHub: https://github.com/Shreyas-Dayal/ai-cli-bridge Would love feedback on the approach and any security concerns I might have missed. |
The JSON-on-disk pattern works surprisingly well for this scale. We found the key insight is making costs visible in real-time rather than waiting for end-of-month bills. Even just seeing token counts per request changes behavior.
Curious if you've hit the CLI latency wall yet with concurrent users - that 3-8s overhead compounds fast with a team.