| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by beiller 958 days ago
	It waits for sufficient silence to determine when to stop recording the voice and send it to the model. There is other modes in the source as well and methods of setting the length of silences in order to chunk up and send bits at a time, but I imagine that is either work in progress or not planned for this demo.

1 comments

wahnfrieden 958 days ago

Thanks

I was surprised they didn’t combine this work with the streaming whisper demo. So I guess I will implement that for iOS/macos (streaming whisper results in realtime without waiting on an audio pause, but as you say using the audio pauses and other signals like punctuation in the result to determine when to llm complete; makes me also wonder about streaming whisper results in to the llm incrementally before ready for completion)

link

beiller 957 days ago

It may be using the streaming demo. The reason I know to answer your question is that I had modified the streaming demo myself for personal use before. I think there is bugs in the silence detection code (as of a few months back, maybe fixed now). Maybe what we are seeing in this demo is just the "silence detection" setting to be waiting for very long pauses, I believe its configurable.

link

wahnfrieden 957 days ago

I added libfvad

link