|
|
|
|
|
by nl
3377 days ago
|
|
This is very good - I've never seen a RNN-for-speech-in-Tensorflow model before. Note, though that the real problem here is the lack of training data. In a recent podcast I heard that the Baidu speech recognition team uses "small models" of 10,000 hours of speech. I forget how big the production quality models were, but it was at least 5 times that. This model uses ~1500 hours[1]. It's very impressive it does as well as it does just using that. [1] https://svds.com/tensorflow-rnn-tutorial/ |
|
Their examples use much less data, just 5 utterances from the Librispeech training set. Which is perfectly fine for a tutorial, since training on 1500h worth of speech data takes from several days to multiple weeks, depending on your hardware.
[edit: IMHO, the tutorial from the Bay Area DL School is more useful to get started: https://github.com/baidu-research/ba-dls-deepspeech)]