If you have a NVIDIA GPU with CUDA already installed
It's unfortunate OpenCL mostly failed to gain mindshare. It would've let anyone with a powerful video card enjoy the benefits of acceleration, rather than half the market.
Tensorflow and Theano themselves only really support Windows. Unless you have a spare 5 hours to set up (and more hours to maintain), you NEED Linux for any serious ML.
Note: I'm just a beginner at ML, but this was my experience setting things up for the first time.
Tensorflow and Theano themselves only really support Windows.
I think you mean "only really support Linux"? The rest of your comment reads like you know that.
TensorFlow at least has now begun supporting Windows in the main release, but you are absolutely correct in saying it has much better support on Linux.
Hey ReverseCold, I had similar experiences to you when I tried TensorFlow fist ~1 yr ago, but since then, the ease of `pip install tensorflow` in a virtualenv has really made it fast and relatively straightforward to use TensorFlow with CPU-only on either Mac or Ubuntu (I haven't tried any other linux distro myself). I agree though, getting everything to work with GPU is a bit of a pain :)
By all metrics in almost all markets, Nvidia is dominant in high end GPU sales, it isn't a 50/50 market. On Steam they are at 60% vs AMDs 23% (http://store.steampowered.com/hwsurvey). All major cloud computing providers deploy Nvidia GPUs and in scientific computing and machine learning they own close to 100% of the market. AMD is only really dominant in game consoles, which is the lowest margin GPU market. OpenCL just doesn't have the software support that CUDA has for scientific computing, and much of that is because Nvidia actively works to support that community.
Then it's unfortunate Apple decided to integrate AMD and Intel cards into MacBook Pro's instead of Nvidia. Many ML researchers probably use MBP's but can't leverage acceleration.
Eh, it's not really that consequential because anything big will need way more horsepower than you're gonna get on any mobile GPU to be able to done in a reasonable amount of time. We built a CLI tool for our stuff on AWS and our gaming/ML desktop at the office specifically because everyone is on laptops and training or evals are so slow.
It really is. I stopped playing around with CUDA precisely because Apple dropped Nvidia GPUs. Granted to anything serious you want something other than a laptop but it's still nice for quick prototyping (I'm referring to CUDA in general not just using it for ML).
This is very good - I've never seen a RNN-for-speech-in-Tensorflow model before.
Note, though that the real problem here is the lack of training data.
In a recent podcast I heard that the Baidu speech recognition team uses "small models" of 10,000 hours of speech. I forget how big the production quality models were, but it was at least 5 times that.
This model uses ~1500 hours[1]. It's very impressive it does as well as it does just using that.
Mozilla Deepspeech has been around for a while (https://github.com/mozilla/DeepSpeech). In fact, this code looks like it heavily copied from there (see attribution notices in comments).
Their examples use much less data, just 5 utterances from the Librispeech training set. Which is perfectly fine for a tutorial, since training on 1500h worth of speech data takes from several days to multiple weeks, depending on your hardware.
Is there a way you can decode arbitrary wav files by cloning the repo after you train it? I couldn't find out whether it was capable by reading the tutorial and README.
Hey Aetherspawn, the current repo does not currently have code for decoding individual fed in .wav files that are not in the train/dev/test sets. We'll polish up our code that simplifies decoding and add it to the repo soon, then shoot you a message
Thank you this is very interesting.
I wonder about your initial train config, wouldn't it be more efficient with time in mind for demo purposes, to use more wav samples with less epochs?
I agree that it would be more efficient to have more wav files in the github repo, but we kept them minimal to reduce the total file size when cloning the repository. You can find more of the Librispeech data here: http://www.openslr.org/12/
We kept the epochs at 100 to demonstrate the negative consequence of overfitting training data, when doing the test or dev set evaluations. We could probably reduce that to ~50 though to save time in training :)
It's unfortunate OpenCL mostly failed to gain mindshare. It would've let anyone with a powerful video card enjoy the benefits of acceleration, rather than half the market.
Cool tutorial. Thanks for sharing!