Hacker News new | ask | show | jobs
by daanzu 2313 days ago
This comment fails to mention Dragonfly with my Kaldi Active Grammar backend [1], which is cross platform (Windows/Linux now and Mac functional and to be released soon), completely free with no private beta features (although I do accept donations), and 100% open source (unlike Talon). The speech recognition is local, with extremely low latency. See the video demonstration [2] on the project page. I think the underlying Kaldi engine delivers unmatched accuracy as a free non-commercial engine.

I created Kaldi Active Grammar because I didn't trust relying on closed source software for something so crucial to my productivity, where a decision by an outside party determines whether I can function. As a bonus, open source means I can make it work better to fit my needs than closed source ever could.

Furthermore, the original article mentions Caster (which is built on Dragonfly), but doesn't mention that KaldiAG works with it, and that work is underway to expand Caster's platform support.

[1] https://github.com/daanzu/kaldi-active-grammar

[2] https://youtu.be/Qk1mGbIJx3s

2 comments

I think to state "unmatched accuracy" in good faith we should actually come up with a common benchmark and measure against it. I believe there aren't really any clean benchmarks for command accuracy floating around (it would be ideal if we used a strict grammar to properly measure the command decoders), and a wav2letter model holds the state of the art for librispeech WER% as of 2019.

I found your measurement here [1] which is against an unknown wav2letter acoustic+language model pair, as the web demo is at any given point in time running an arbitrary model based on having users test in-progress models, and it has never been running the model I am currently shipping with Talon.

(As a small example, the unfinished wav2letter experiment I am training right now has a 3.17% WER on speech commands, and 6.86% WER on librispeech clean, both numbers without using a language model)

[1] https://github.com/daanzu/kaldi-active-grammar/blob/master/d...

I am all for devising a good, fair apples to apples comparison. If you have any suggestions, let me know. In lieu of that, I use what I have available. While accuracy numbers from papers are informative and interesting, I don't think they directly apply to our usage particularly well. I would prefer to use numbers from actual usage.
I realise this is a little off-topic, but FYI the bolding of so many words & phrases in the README for kaldi-active-grammar makes it really hard to read for me.