|
|
|
Show HN: Tokio-prompt-orchestrator – LLM pipeline orchestration in Rust
(github.com)
|
|
2 points
by Shmungus
111 days ago
|
|
I built this after getting frustrated with "multi-agent" frameworks that claim parallelism but are really just one fat async task with no resource bounds. tokio-prompt-orchestrator breaks LLM inference into 5 physical stages (RAG → Assemble → Inference → Post-Process → Stream), each running in its own Tokio task with bounded channels between them. When a stage falls behind, backpressure builds locally instead of blowing up the whole pipeline.
Some things that might be interesting to folks here: Circuit breakers per provider (OpenAI, Anthropic, local llama.cpp) so one failing API doesn't cascade
Request deduplication that saved 60-80% on inference costs in my testing
Prometheus metrics + a TUI dashboard for watching the pipeline in real time
MCP server integration so you can use it as a Claude Desktop tool It's 58k lines of Rust, MIT licensed, no unsafe. Been running it in production for my own projects for a few months now.
Would love feedback on the channel sizing heuristics and the retry/backoff strategy, those were the hardest parts to get right. Happy to answer questions about the architecture. GitHub: https://github.com/Mattbusel/tokio-prompt-orchestrator |
|