Hacker News new | ask | show | jobs
by stephensonsco 2744 days ago
If you're training from scratch around 10k hours is needed to get a good model, but when you are transfer learning you don't need nearly that much (100 hours gets you a lot).

We excel in phone call and meetings settings. I.e. the typical sales/office/support environment.

1 comments

Baidu trained their DeepSpeech model with 6000 hours of English to get a model similarly accurate to Google/Microsoft, it may just be the type of quick model your using that needs 10k hours to achieve good results.

Mozilla's DeepSpeech is quite interesting, languages like Turkish can get a decently usable (~20% WER) model with just 80hrs of training data (no transfer learning, starting from a clean slate).

Yep, all good points. One thing to consider is that generalization is a big problem. It's easy to get good on a specific dataset nowadays (like 5-10% word error rate level on academic datasets), but that same model might do 40% WER on data in the wild.