| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by daanzu 1606 days ago
	"Everything other than talon has terrible latency": False! I develop kaldi-active-grammar (https://github.com/daanzu/kaldi-active-grammar), a free and open source speech recognition backend, which has extremely low latency. You can adjust how aggressive the VAD (voice activity detection) is to suit your preference, but the speech engine latency can be almost negligible, especially for voice commands (vs prose dictation). However, I agree that "most existing speech recognition engines were not designed with the kind of latency you want for quick one syllable commands", and that low latency is pivotal to being productive with voice commands. I also agree with your other points.

2 comments

tdj 1605 days ago

I built a similar app using a Kaldi's nnet3 model running embedded; the thing was so responsive that our demo to an SVP went sideways: when he gave a query, the app responded nearly immediately after the sentence ended. The SVP did not realize it already responded, as the expectation for voice interaction systems was that it takes like 2-5 seconds to get an answer, which made the impression that the system did not work properly.

So, moral of the story, if you do a too good job of making a fast speech engine, especially for multi-turn dialogues, add some delays so it resembles human dialogue more.

dataangel 1603 days ago

Sorry, should have said everything I have tried :)

At some point when I have enough free time I will have to take a look at this! Thanks for putting time into this kind of thing!