Hacker News new | ask | show | jobs
by lunixbochs 2313 days ago
I think to state "unmatched accuracy" in good faith we should actually come up with a common benchmark and measure against it. I believe there aren't really any clean benchmarks for command accuracy floating around (it would be ideal if we used a strict grammar to properly measure the command decoders), and a wav2letter model holds the state of the art for librispeech WER% as of 2019.

I found your measurement here [1] which is against an unknown wav2letter acoustic+language model pair, as the web demo is at any given point in time running an arbitrary model based on having users test in-progress models, and it has never been running the model I am currently shipping with Talon.

(As a small example, the unfinished wav2letter experiment I am training right now has a 3.17% WER on speech commands, and 6.86% WER on librispeech clean, both numbers without using a language model)

[1] https://github.com/daanzu/kaldi-active-grammar/blob/master/d...

1 comments

I am all for devising a good, fair apples to apples comparison. If you have any suggestions, let me know. In lieu of that, I use what I have available. While accuracy numbers from papers are informative and interesting, I don't think they directly apply to our usage particularly well. I would prefer to use numbers from actual usage.