|
|
|
|
|
by com2kid
412 days ago
|
|
A lot better techniques exist now days than pure silence detection - 1. A special model that predicts when a conversation turn is coming up (e.g. when someone is going to stop speaking). Speech has a rhythm to it and pauses / ends of speech are actually predictable. 2. Generate a model response for every subsequent word that comes in (and throw away the previously generated response), so basically your time to speak after doing some other detection is basically zero. 3. Ask an LLM what it thinks the odds of the user being done talking is, and if it is a high probability, reduce delay timer down. (The linked repo does this) I don't know of any up to date models for #1 but I haven't checked in over a year. Tl;Dr the solution to problems involving AI models is more AI models. |
|