| HN Mirror

You properly mentioned timestamps. There are many other important properties of good ASR system like vocabulary adaptability (if you can introduce new words) or streaming. Or confidences. Or latency of the output. Compared to Vosk models this model can not work in streaming manner, so not very suitable for real-time applications.

But in general the model is robust and accurate and trained on the amount of speech we never dreamed about in Vosk. We will certainly benefit from this model as a teacher (together with others like gigaspeech models). I recently wrote about it https://alphacephei.com/nsh/2022/06/14/voting.html