Hacker News new | ask | show | jobs
by ramonverse 668 days ago
how do you handle latency issues in real-time voice conversations? what specific optimizations have you implemented in the orchestration layer to minimize delays between speech recognition, llm processing, and text-to-speech output?
1 comments

hey! that's a great question. Initially, we had multiple processes for every component (ASR, LLM, TTS) and using a configurable settings pair of endpointing and token_size we used to handle it since for some cases latency might be an issue but some some others (where there are longer responses) it might not be that much of an issue. Later on, we also integrated with caching and routing to minimize unnecessary calls.