| HN Mirror

Mozilla DeepSpeech is rapidly maturing, but it needs thousands of hours of validated audio data to train each language. Its a feat that with only 2000 hours of audio they can achieve a 5.97% word error rate.

Baidu had 5000 hours of audio data to train their DeepSpeech and DeepSpeech 2 models, meanwhile Google, Microsoft & IBM have people constantly giving them fresh audio to train and validate their models with.

Firefox Voice data should help rapidly expand the Common Voice audio corpus beyond the 1492hrs it currently contains: https://commonvoice.mozilla.org/en/datasets