Hacker News new | ask | show | jobs
by nl 3377 days ago
This is very good - I've never seen a RNN-for-speech-in-Tensorflow model before.

Note, though that the real problem here is the lack of training data.

In a recent podcast I heard that the Baidu speech recognition team uses "small models" of 10,000 hours of speech. I forget how big the production quality models were, but it was at least 5 times that.

This model uses ~1500 hours[1]. It's very impressive it does as well as it does just using that.

[1] https://svds.com/tensorflow-rnn-tutorial/

1 comments

Mozilla Deepspeech has been around for a while (https://github.com/mozilla/DeepSpeech). In fact, this code looks like it heavily copied from there (see attribution notices in comments).

Their examples use much less data, just 5 utterances from the Librispeech training set. Which is perfectly fine for a tutorial, since training on 1500h worth of speech data takes from several days to multiple weeks, depending on your hardware.

[edit: IMHO, the tutorial from the Bay Area DL School is more useful to get started: https://github.com/baidu-research/ba-dls-deepspeech)]