Recurrent Neural Networks - A Short TensorFlow Tutorial

Y	Hacker News new \| ask \| show \| jobs

	Recurrent Neural Networks - A Short TensorFlow Tutorial (github.com)
	194 points by mrubashkin 3377 days ago

5 comments

sillysaurus3 3377 days ago

If you have a NVIDIA GPU with CUDA already installed

It's unfortunate OpenCL mostly failed to gain mindshare. It would've let anyone with a powerful video card enjoy the benefits of acceleration, rather than half the market.

Cool tutorial. Thanks for sharing!

link

nikcub 3377 days ago

I don't think anybody is opposed to Tensorflow on OpenCL, it's just that CUDA is so common.

You can follow the issue to add OpenCL support here:

https://github.com/tensorflow/tensorflow/issues/22

and one of the projects here:

https://github.com/benoitsteiner/tensorflow-opencl

From what I understand it requires Linux at the moment since it is built on ComputeCpp

link

ReverseCold 3377 days ago

Tensorflow and Theano themselves only really support Windows. Unless you have a spare 5 hours to set up (and more hours to maintain), you NEED Linux for any serious ML.

Note: I'm just a beginner at ML, but this was my experience setting things up for the first time.

link

nl 3377 days ago

Tensorflow and Theano themselves only really support Windows.

I think you mean "only really support Linux"? The rest of your comment reads like you know that.

TensorFlow at least has now begun supporting Windows in the main release, but you are absolutely correct in saying it has much better support on Linux.

link

mrubashkin 3377 days ago

Hey ReverseCold, I had similar experiences to you when I tried TensorFlow fist ~1 yr ago, but since then, the ease of `pip install tensorflow` in a virtualenv has really made it fast and relatively straightforward to use TensorFlow with CPU-only on either Mac or Ubuntu (I haven't tried any other linux distro myself). I agree though, getting everything to work with GPU is a bit of a pain :)

link

Impossible 3377 days ago

By all metrics in almost all markets, Nvidia is dominant in high end GPU sales, it isn't a 50/50 market. On Steam they are at 60% vs AMDs 23% (http://store.steampowered.com/hwsurvey). All major cloud computing providers deploy Nvidia GPUs and in scientific computing and machine learning they own close to 100% of the market. AMD is only really dominant in game consoles, which is the lowest margin GPU market. OpenCL just doesn't have the software support that CUDA has for scientific computing, and much of that is because Nvidia actively works to support that community.

link

sillysaurus3 3377 days ago

Then it's unfortunate Apple decided to integrate AMD and Intel cards into MacBook Pro's instead of Nvidia. Many ML researchers probably use MBP's but can't leverage acceleration.

link

ma2rten 3377 days ago

You wouldn't wanna run long-running models that take days to train on your laptop anyway.

link

CoolGuySteve 3376 days ago

No but you would want to develop them on a truncated set of the data without being tethered to the internet.

It's a reasonable use case.

link

boxcardavin 3377 days ago

Eh, it's not really that consequential because anything big will need way more horsepower than you're gonna get on any mobile GPU to be able to done in a reasonable amount of time. We built a CLI tool for our stuff on AWS and our gaming/ML desktop at the office specifically because everyone is on laptops and training or evals are so slow.

link

cstejerean 3377 days ago

It really is. I stopped playing around with CUDA precisely because Apple dropped Nvidia GPUs. Granted to anything serious you want something other than a laptop but it's still nice for quick prototyping (I'm referring to CUDA in general not just using it for ML).

link

mrubashkin 3377 days ago

Thanks for the comment! I haven't read much about OpenCL: https://en.wikipedia.org/wiki/OpenCL I'm looking forward to learning more :)

link

nl 3377 days ago

This is very good - I've never seen a RNN-for-speech-in-Tensorflow model before.

Note, though that the real problem here is the lack of training data.

In a recent podcast I heard that the Baidu speech recognition team uses "small models" of 10,000 hours of speech. I forget how big the production quality models were, but it was at least 5 times that.

This model uses ~1500 hours[1]. It's very impressive it does as well as it does just using that.

[1] https://svds.com/tensorflow-rnn-tutorial/

link

woodson 3376 days ago

Mozilla Deepspeech has been around for a while (https://github.com/mozilla/DeepSpeech). In fact, this code looks like it heavily copied from there (see attribution notices in comments).

Their examples use much less data, just 5 utterances from the Librispeech training set. Which is perfectly fine for a tutorial, since training on 1500h worth of speech data takes from several days to multiple weeks, depending on your hardware.

[edit: IMHO, the tutorial from the Bay Area DL School is more useful to get started: https://github.com/baidu-research/ba-dls-deepspeech)]

link

posterboy 3376 days ago

https://github.com/baidu-research/ba-dls-deepspeech is 404

link

aetherspawn 3377 days ago

Is there a way you can decode arbitrary wav files by cloning the repo after you train it? I couldn't find out whether it was capable by reading the tutorial and README.

link

mrubashkin 3377 days ago

Hey Aetherspawn, the current repo does not currently have code for decoding individual fed in .wav files that are not in the train/dev/test sets. We'll polish up our code that simplifies decoding and add it to the repo soon, then shoot you a message

link

iplaman 3377 days ago

Thank you this is very interesting. I wonder about your initial train config, wouldn't it be more efficient with time in mind for demo purposes, to use more wav samples with less epochs?

link

mrubashkin 3377 days ago

I agree that it would be more efficient to have more wav files in the github repo, but we kept them minimal to reduce the total file size when cloning the repository. You can find more of the Librispeech data here: http://www.openslr.org/12/

We kept the epochs at 100 to demonstrate the negative consequence of overfitting training data, when doing the test or dev set evaluations. We could probably reduce that to ~50 though to save time in training :)

link

abainbridge 3376 days ago

Is this speech-to-text or text-to-speech or something else?

link