| I've been collecting real incidents over the past few months. The pattern is always identical: - $128,000 — small company in Japan, caught it at $44K, shut everything down, charges kept accumulating, Google denied the adjustment request - $82,314 — 3-person startup, Gemini API key silently reused, normal monthly spend was $180 - $55,444 — student, key leaked on GitHub during summer break, discovered months later - ~$75,000 — student in India building a blood cancer detection tool, received legal threats from Google The root cause in every case: GCP billing data lags 4-12 hours. By the time a budget alert fires, the damage is already done. There is no native automatic kill switch. The only reliable real-time signal is `serviceruntime.googleapis.com/api/request_count` via Cloud Monitoring, which has a 3-5 minute ingestion delay. Budget alerts don't use this — they use billing data. Has anyone else dealt with this? Curious how teams are protecting themselves today. |
- Built internal tooling to keep keys out of AI chats and anywhere they could leak. The moment a raw key enters a conversation or a shared space, you've lost control of it.
- LLM gateways with capped virtual keys per developer and separate service accounts. If a key leaks, it's easy to kill, doesn't affect the product, and the damage is capped — not your whole billing account.
- A scoped intermediary layer for any autonomous agents. Anything running without a human in the loop gets its own access that we can kill in seconds.
We ended up building some custom tooling here specifically for working with AI agents. There's always this tension between the easy way (just paste it into the chat, it'll be fine) and the proper way, which usually ends up being too cumbersome for anyone to actually follow.