| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by posguy 2132 days ago

Based on the testing I just did with Vosk, Mozilla DeepSpeech, Google Speech to Text and Microsoft Azure, I disagree with your arugment that SaaS has the best quality results.

Mozilla DeepSpeech was definitely trailing the bleeding edge, but Vosk using the vosk-model-en-us-daanzu-20200328 model produces very accurate results even on uncommon words, similar in performance to Google & Microsoft (which has generally better formatting than Google's STT)

Try it yourself:

Google: https://cloud.google.com/speech-to-text/ See "Put Speech-to-Text into action" header

Microsoft: https://azure.microsoft.com/en-us/services/cognitive-service... See "Upload File"

Vosk: https://alphacephei.com/vosk/

Had Mozilla provided 4x to 8x more GPU resources and more staff, then their STT would likely be competitive. Other small STT developers can iterate and test much faster due to having more hardware at their disposal.