|
|
|
|
|
by vasishath4
46 days ago
|
|
I am myself working on something similar, but i have noticed that if I try to pass on early speech from the user to the LLM to reduce latency, chances of interruptions get even higher. For example, the user may say something like “Yes” followed by a brief pause, leading the speech model to count that as a complete turn, triggering the LLM call. But then the user may add something more, so i have to cancel the previous request so that any irreversible state transitions can be avoided. Now due to the lower latency (due to speculative calls), I get an even smaller window to actually cancel the response or even to stop the model from speaking. |
|
Humans actually do the second thing, where we not only use our "model" to figure out end of turn, we actually predict what they are going to say based on context and will sometimes answer before they even finish.