| HN Mirror

For these tasks and languages, SeamlessM4T achieves state-of-the-art results for nearly 100 languages and multitask support across automatic speech recognition, speech-to-text, speech-to-speech, text-to-speech, and text-to-text translation—all in a single model. We also significantly improve performance for low and mid-resource languages supported and maintain strong performance on high-resource languages.

To more accurately evaluate the system without depending on text-based metrics, we extended our text-less metric into BLASER 2.0, which now enables evaluation across speech and text units with similar accuracy compared to its predecessor. When tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks (average improvements of 37% and 48%, respectively) compared to the current state-of-the-art model.

SeamlessM4T also outperforms previous state-of-the-art competitors.