|
|
|
Ask HN: How do you prevent retry cascades in LLM systems?
|
|
2 points
by amabito
120 days ago
|
|
We ran into a retry amplification issue in one of our LLM agents recently. The provider returned 429s for a short period.
We had per-call retry limits in place.
We did NOT have containment at the request-chain level. Because calls were nested and spread across multiple workers,
retries multiplied in ways we didn’t anticipate. Per-call limits were not enough. For those running LLM systems in production: – Do you implement chain-level retry budgets?
– Shared circuit breaker state?
– Per-minute cost ceilings?
– Cost-based limits (tokens/$) rather than retry count?
– Or is exponential backoff usually sufficient in practice? I’m trying to understand what actually works at scale,
beyond monitoring dashboards that tell you after the fact. |
|