|
|
|
Ask HN: How to scale agent systems when Layer 7 is unreliable?
|
|
1 points
by rjpruitt16
108 days ago
|
|
Agent workflows often involve 10+ API calls to different services
(LLMs, data APIs, web scraping). Layer 7 being unreliable =
workflows fail or cause retry storms. Common failure modes I'm thinking about:
- 429 rate limits → agents retry → hammer API worse
- Partial outages → synchronized retries across customers
- LangGraph workflows fail mid-execution → how to resume? For those running agent systems at scale:
- How do you handle Layer 7 failures?
- Retry coordination? Circuit breakers?
- How do you prevent retry storms to downstream dependencies?
- Do LangGraph workflows gracefully handle API failures? Curious what the production reality looks like. |
|