|
|
|
|
|
by espadrine
329 days ago
|
|
The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better. One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR. [0]: https://mistral.ai/news/voxtral |
|
Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"