| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by evara-ai 103 days ago

This is a real operational problem when you're building client-facing automation systems on top of these APIs. I build chatbots, workflow automation, and AI agent systems for clients — and the hardest conversation is explaining that your system's uptime is fundamentally capped by your LLM provider's uptime.

Patterns that have helped in production:

1. Multi-provider fallback. For conversational systems, route to Claude by default, fall back to GPT-4 on 5xx errors. The response quality difference is usually acceptable for the 2-3% of requests that hit the fallback. This turns a hard outage into a slight quality degradation.

2. Async queuing for non-real-time workflows. If you're processing documents, generating reports, or running batch analysis — don't call the API synchronously. Queue the work, retry with exponential backoff, and let the system self-heal when the API recovers. Most of our automation pipelines run with a 15-minute SLA, not a 500ms one.

3. Graceful degradation in real-time systems. For chatbots and voice agents, have a scripted fallback path. "I'm having trouble processing that right now — let me transfer you to a human" is infinitely better than a hung connection or error message.

The broader issue: we're all building on infrastructure where "four nines" isn't even on the roadmap yet. That's fine if you architect for it — treat LLM APIs like any other unreliable external dependency, not like a database query.