Hacker News new | ask | show | jobs
by metildaa 2744 days ago
Baidu trained their DeepSpeech model with 6000 hours of English to get a model similarly accurate to Google/Microsoft, it may just be the type of quick model your using that needs 10k hours to achieve good results.

Mozilla's DeepSpeech is quite interesting, languages like Turkish can get a decently usable (~20% WER) model with just 80hrs of training data (no transfer learning, starting from a clean slate).

1 comments

Yep, all good points. One thing to consider is that generalization is a big problem. It's easy to get good on a specific dataset nowadays (like 5-10% word error rate level on academic datasets), but that same model might do 40% WER on data in the wild.