|
|
|
|
|
by goffi
1362 days ago
|
|
Really interesting, I can see ton of potential uses. 2 questions: 1) how does it compare to state of the art FOSS solutions? I'm seeking about DeepSpeech or Vosk 2) would it be somehow possible to associate timestamp to the words recognized? That would be amazing for things such as audio editing or skipping to a particular location on a video |
|
But in general the model is robust and accurate and trained on the amount of speech we never dreamed about in Vosk. We will certainly benefit from this model as a teacher (together with others like gigaspeech models). I recently wrote about it https://alphacephei.com/nsh/2022/06/14/voting.html