| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by com2kid 412 days ago

A lot better techniques exist now days than pure silence detection -

1. A special model that predicts when a conversation turn is coming up (e.g. when someone is going to stop speaking). Speech has a rhythm to it and pauses / ends of speech are actually predictable.

2. Generate a model response for every subsequent word that comes in (and throw away the previously generated response), so basically your time to speak after doing some other detection is basically zero.

3. Ask an LLM what it thinks the odds of the user being done talking is, and if it is a high probability, reduce delay timer down. (The linked repo does this)

I don't know of any up to date models for #1 but I haven't checked in over a year.

Tl;Dr the solution to problems involving AI models is more AI models.

1 comments

addandsubtract 412 days ago

I think 2 & 3 should be combined. The AI should just finish the current sentence (internally) before it's being spoken, and once it reaches a high enough confidence, stick with the response. That's what humans do, too. We gather context and are able to think of a response while the other person is still talking.

link

com2kid 412 days ago

You use a smaller model for confidence because those small models can return results quickly. Also it keeps the AI from being confused trying to do too many things at once.

link