Hacker News new | ask | show | jobs
by pwaivers 3326 days ago
As far I understood it, Facebook put lots of research into optimizing a certain type of neural network (CNN), while everyone else is using another type called RNN. Up until now, CNN was faster but less accurate. However FB has progressed CNN to the point where it can compete in accuracy, particularly in speech recognition. And most importantly, they are releasing the source code and papers. Does that sound right?

Can anyone else give us an ELI5?

2 comments

I'll give it a shot.

Traditional Neural Networks worked like this: You have k inputs to a layer, and j outputs, so you have O(k * j) parameters, effectively multiplying the inputs by the parameter to get the outputs. And if you have lots of inputs to each layer, and lots of layers, you have a lot of parameters. Too many parameters = overfitting to your training data pretty quickly. But you want big networks, ideally, to get super accuracy. So the question is how to reduce the number of parameters while still having the same 'power' in the network.

CNNs (Convolutional Neural Networks) solve this problem by tying weights together. Instead of multiplying every input by every output, you build a small set of functions at each layer with a small number of parameters in each, and multiple nearby groups of inputs together. Images are the best way to describe this: a function will take as inputs small (3x3 or 5x5) groups of pixels in the image, and output a single result. But they apply the same function all over the image. Picture a little 5x5 box moving around the image, and running a function at each stop.

This has given some pretty incredible results in the image-recognition problem space, and they're super simple to train.

Another approach, Recurrent Neural networks (RNNs) turns the model around in a different way. Instead of having a long list of inputs that all come at once, it takes each input one at a time (or maybe a group at a time, same idea) and runs the neural-network machinery to build up to a single answer. So you might feed it one word at a time of input in English, and after a few words, it starts outputting one word at a time in French until the inputs run out and the output says its the end of the sentence.

What Facebook is doing is applying CNNs to text-sequence and translation problems. It seems to me that what they have here is kind of a RNN-CNN hybrid.

Caveats: I'm an idiot! I just read a lot and play around with ML, but I'm not an expert. Please correct me if I'm wrong, smarter people, by replying.

> Please correct me if I'm wrong, smarter people, by replying.

You are not an idiot, maybe not an expert but definitely not an idiot. Your description is quite easy to understand for someone without knowledge in the field. I would add only that RNN are called recurrent because their have recurrent connection with other neurons, and that is why they are hard to parallelize. You need the output the one neuron to compute the output of other neuron in the same layer, so you cannot parallelize that layer. This doesn't happen in CNN.

That's a great explanation.

Let me add this though:

Artificial neural networks were proposed to compute the probability of a sequence of words occurring; however, RNNs were the next step in Natural Language Processing since they allow variable-length sequences to be received as an input contrary to the previously proposed architecture.

However a simple RNN architecture didn't allow for long -term dependencies to be captured (that is, use statistical modeling to predict a word sequence on a part of a text that is based on an idea previously developed on the corpus). So two kinds of fancy RNN architectures were developed to tackle this problem: GRUs and LSTMs. Production systems are already implementing these architectures and they are yielding pretty accurate results.

But now Facebook researchers are proposing using CNNs for this task because this architecture can take more advantage of GPU parallelism.

Not an expert, but as I understand it, common practice (everywhere, not just at Facebook) is to use CNN for understanding images and other kinds of non-sequential data. RNN are commonly used for handling text and other kinds of sequential data.

They showed how to use a CNN with text to get a speed boost, even though that's not how it's normally been done.