|
|
|
|
|
by koljab
402 days ago
|
|
It's in fact using Silero via RealtimeSTT. RealtimeSTT tells when silence starts. Then a binary sentence classification model is used on the realtime transcription text which infers blazingly fast (10ms) and returns a probability between 0 and 1 indicating if the current spoken sentence is considered "complete". The turn detection component takes this information to calculate the silence waiting time until "turn is over". |
|
Code: https://github.com/livekit/agents/tree/main/livekit-plugins/... Blog: https://blog.livekit.io/using-a-transformer-to-improve-end-o...