|
|
|
|
|
by PranayKumarJain
121 days ago
|
|
Nice work — real-time voice plumbing always looks “simple” until you build it. A few things that helped us keep cost + complexity sane on similar voice-agent flows: - Treat the call as a state machine (collect slots -> confirm -> execute). Don’t let the LLM free-run every turn; use small models for routing/slot-filling, escalate only on ambiguity.
- Put hard guardrails on “thinking”: max tokens/turn + short system prompts. It’s shocking how often cost is prompt bloat + retry loops.
- If you’re using Twilio, Media Streams + a streaming STT/TTS loop reduces latency and avoids “LLM per sentence” patterns.
- Phone-number discovery: try a tiered approach (cached business DB / Places API / fallback scrape) and cache aggressively; scraping every time is where it gets gnarly. We build production voice agents at eboo.ai and have hit the same Twilio + latency + cost cliffs — happy to share patterns if you want to compare notes. |
|