| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crakenzak 1122 days ago
	according to their blog post[1], MMS achieves ~half the error rate on words, while supporting 11x more languages. pretty impressive. [1] https://ai.facebook.com/blog/multilingual-model-speech-recog...

2 comments

youssefabdelm 1122 days ago

I wonder what the performance is on English specifically.

Edit: Just checked the paper, it seems to be worse[1][2] but feel free to correct me.

I feel like they should've just taken the Whipser architecture, scaled it, and scaled the dataset as they did.

[1] Page: https://i.imgur.com/bq15Tno.png

[2] Paper: https://scontent.fcai19-5.fna.fbcdn.net/v/t39.8562-6/3488279...

link

sacred_numbers 1122 days ago

It's worse on English and a lot of other common languages (see Appendix C of the paper). It does better on less common languages like Latvian or Tajik, though.

link

hackernewds 1122 days ago

Which implies, Whisper just hasn't focused on those languages? Seems disingenuous to make the claim that the error rate has halved, when it's worse in the apex language

link

whimsicalism 1122 days ago

My guess is wav2vec performs better on low resource than whisper.

link

93po 1122 days ago

lack of labels on graph axes should be a crime

link