Hacker News new | ask | show | jobs
by vasishath4 46 days ago
I am myself working on something similar, but i have noticed that if I try to pass on early speech from the user to the LLM to reduce latency, chances of interruptions get even higher. For example, the user may say something like “Yes” followed by a brief pause, leading the speech model to count that as a complete turn, triggering the LLM call. But then the user may add something more, so i have to cancel the previous request so that any irreversible state transitions can be avoided. Now due to the lower latency (due to speculative calls), I get an even smaller window to actually cancel the response or even to stop the model from speaking.
1 comments

Detecting end of turn is a whole other issue. You can do the easy thing, which is just assign some number of milliseconds of silence as the end, or you can spend a lot of money asking the model to figure it out based on context.

Humans actually do the second thing, where we not only use our "model" to figure out end of turn, we actually predict what they are going to say based on context and will sometimes answer before they even finish.