Who doesn't have hard billing caps for inference? Microsoft, Google and AWS my friend. And you know who uses Microsoft, Google and AWS? Almost all big corporations do use them instead of direct OAI or Anthropic API because all their contracts and infra are built around the big cloud providers.
> Long-running tasks like batch mode completions and agent sessions may incur overages beyond your project spend cap.
> Billing data processing times can be delayed in AI Studio, up to around 10 minutes. You may experience overages beyond your project cap if billing data hasn't processed before more charges are accrued.
Anthropic: https://support.claude.com/en/articles/8977456-how-do-i-pay-... - you can pre-pay and get a hard cutoff.
OpenAI: https://community.openai.com/t/how-to-set-billing-limits-and... - last time I looked OpenAI had a soft but not hard limit, I guess they fixed that last year.
I remember bugging them both about this last year, I need to update my mental model!