Hacker News new | ask | show | jobs
by daanzu 2196 days ago
I agree with everything you said, but I would add that a critical component of voice command and control is strict grammars. There is so much structure and context in what we speak, and being able to limit what can be recognized to only what can be reasonably spoken (based on the current context) can allow massive increases in accuracy. (EDIT: ah, you edited to add a mention of this as well.)

And one shameless plug deserves another! Vosk is a great project, but my kaldi-active-grammar [0] (mentioned in another comment here) also uses the same Kaldi engine, but extends it and is designed specifically for this use case. It supports defining many grammars, in any combination, and activating/deactivating them at will instantly per-utterance. I think it's probably a better fit as a backend for your project than vosk. My work focuses on the backend technology, so it would be great to have more front ends using it to put it within users' reach (so to speak).

[0] https://github.com/daanzu/kaldi-active-grammar