|
|
|
|
|
by PranayKumarJain
120 days ago
|
|
Great work on open-sourcing the orchestrator. Full-duplex and barge-in are definitely the hardest parts to nail—getting those audio buffers cleared and the LLM stream killed in sub-500ms makes or breaks the "human" feel. Curious about how you're handling VAD in noisy environments—do you find the RMS-based approach holds up well for telephony, or are you considering a more robust model-based VAD (like Silero) for the future? We're tackling similar low-latency orchestration challenges at eboo.ai. It's great to see more Go-based tools in this space. Subscribed to the repo! |
|
You're spot on about VAD, too. RMS is our 'MVP debt', it’s fine for clean mics, but we’re definitely looking at a Silero bridge for telephony/noisy environments.
Also, we actually built this because we run Lokutor (ultra-low latency TTS). If you guys at eboo.ai are hunting for faster inference, hit me up—would love to get you a key to play with.