|
|
|
Show HN: Stop Losing LangGraph Progress to 429 Errors
(ezthrottle.network)
|
|
1 points
by rjpruitt16
127 days ago
|
|
Hey HN, I built this because I kept losing progress in LangGraph workflows when OpenRouter or OpenAI returned 429s.
The problem: You're 7 steps into an agent workflow. Step 7 hits a rate limit. Everything crashes. Restart from step 1.
Client-side retries don't help at scale: 100 workers all retry independently → retry storm
Sequential fallbacks are slow (try OpenRouter, wait 5s, try Anthropic, wait 5s)
No coordination across instances So I built a coordination layer that: Races multiple providers simultaneously (OpenRouter + Anthropic + OpenAI)
Coordinates retries across all workers (no retry storms)
Resumes workflows via webhooks (idempotent keys = checkpoints) It runs on Fly.io's anycast network + BEAM for distributed coordination.
Architecture deep dive: https://www.ezthrottle.network/blog/making-failure-boring-ag...
Happy to answer questions about the approach or why BEAM made this possible when other languages would struggle. |
|