Hacker News new | ask | show | jobs
by nshm 2132 days ago
Try https://github.com/alphacep/vosk-api. It supports 10 languages, works on Android and RPi and also has big and more accurate server models.

Other good ones are https://github.com/daanzu/kaldi-active-grammar and https://talonvoice.com/

There are toolkits for research like https://github.com/kaldi-asr/kaldi, https://github.com/espnet/espnet, wav2letter, Espresso, Nvidia/Nemo, https://github.com/didi/athena. You can try them too if you want to go deep. Some of them have interesting capabilities.

1 comments

Comparing DeepSpeech v0.7.4 to Vosk using plain spoken English samples from male and female speakers, they seem to be performing the same if I use vosk-model-small-en-us-0.3 and the full size DeepSpeech model.

When I use vosk-model-en-us-daanzu-20200328 the result is perfect on many of these tests, though it does not do punctuation or capitalization outside apostrophes. IIRC there is another project on Github that can add basic formatting though.

I am quite surprised with vosk's performance, it even handles odd words like Puget Sound well! Need to test our more accented audio on it, but this is quite exciting.