| HN Mirror

hey! that's a great question. Initially, we had multiple processes for every component (ASR, LLM, TTS) and using a configurable settings pair of endpointing and token_size we used to handle it since for some cases latency might be an issue but some some others (where there are longer responses) it might not be that much of an issue. Later on, we also integrated with caching and routing to minimize unnecessary calls.