| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by koljab 450 days ago
	It's in fact using Silero via RealtimeSTT. RealtimeSTT tells when silence starts. Then a binary sentence classification model is used on the realtime transcription text which infers blazingly fast (10ms) and returns a probability between 0 and 1 indicating if the current spoken sentence is considered "complete". The turn detection component takes this information to calculate the silence waiting time until "turn is over".

1 comments

thekaranchawla 449 days ago

This is the exact strategy I'm using for the real-time voice agent I'm building. Livekit also published a custom turn detection model that works really well based on the video they released, which was cool to see.

Code: https://github.com/livekit/agents/tree/main/livekit-plugins/... Blog: https://blog.livekit.io/using-a-transformer-to-improve-end-o...

link