Hacker News new | ask | show | jobs
by modeless 3540 days ago
The reason other researchers haven't jumped on NTMs may be that, unlike commonly-researched types of neural nets such as CNNs or RNNs, NTMs are not currently the best way to solve any real-world problem. The problems they have solved so far are relatively trivial, and they are very inefficient, inaccurate, and complex relative to traditional CS methods (e.g. Dijkstra's algorithm coded in C).

That's not to say that NTMs are bad or uninteresting! They are super cool and I think have huge potential in natural language understanding, reasoning, and planning. However, I do think that DeepMind will have to prove that they can be used to solve some non-trivial task, one that can't be solved much more efficiently with traditional CS methods, before people will join in to their research.

Also, I think there's a possibility that solving non-trivial problems with NTMs may require more computing power than Moore's law has given us so far. In the same way that NNs didn't really take off until GPU implementations became available, we may have to wait for the next big hardware breakthrough for NTMs to come into their own.

2 comments

The brain is not a single universal neural network that does everything well. It's a collection of different neural networks that specialize in different tasks, and probably use very different methods to achieve them.

It seems like the way forward would be networking together various kinds of neural networks to achieve complex goals. For example, an NTM specialized in formulating plans that has access to a CNN for image recognition, and so on.

This is being done using various types of networks. See these slides on image captioning by Karpathy for an example using a CNN and RNN: http://cs.stanford.edu/people/karpathy/sfmltalk.pdf
If we're going with a brain metaphor. What would be the those neural networks' version of synesthesia?
Feeding mp3s to an image recognition neural net. And as soon as I typed that, I want to try it.
Actually, in the architecture you described, if there is a planning net that's connected to image net and an audio net, rather than feeding audio to the image net I think synesthesia would be better modeled by feeding the output of the audio net into the image net's input on the planning net. If that makes sense.
Not the output. Making several single connections from intermediate layers from the different nets.
CNNs can actually be used for audio tasks too, on spectrograms
It's how some guys defeated the first iteration of recaptcha's audio mode. Then google replaced it with something very annoying to use even for humans.
They sure put a lot of focus on "toy" problems such as sorting and path planning in their papers - perhaps because they are easy to understand and show a major improvement over other ML approaches. IMHO they should focus more on "real" problems - e.g. in Table 1 of this paper it seems to be state of the art on the bAbl tasks, which is amazing.
At least some of the "toy" problems aren't chosen just for being easy to solve or understand. They're chosen for being qualitatively different than the kinds of problems other neural nets are capable of solving. Sorting, for example, is not something you can accomplish in practice with an LSTM.

Mainstream work on neural nets is focused on pattern recognition and generation of various forms. I don't mean to trivialize at all when I say this - this gives us a new way to solve problems with computers. It allows us to go beyond the paradigm of hand-built algorithms over bytes in memory.

What DeepMind is exploring with this line of research is whether neural nets can even subsume this older paradigm. Can they learn to induce the kinds of algorithms we're used to writing in our text editors? Given this goal, I think it's better to call problems like sorting "elementary" rather than "toy".

bAbI isn't really a "real" problem either, although somewhat better than sorting and the like. bAbI works with extremely restrictive worlds and grammar. In contrast, current speech recognition, language modeling, and object detection do quite well with actual audio, text, and pictures.

I think the strength of NTMs will be best demonstrated by putting it to work on a long-range language modeling task where you need to organize what you read so that you can use it to predict better a paragraph or two later. Current language models based on LSTM are not really able to do this.

Any chance you could link a pdf of the paper for us?
Once you have a learning machine that can solve simple problems. You can scale it up to solve very complex problems. Its a first step to true AI imho. Al lot of small steps are needed to go towards this goal. Integrating Memory & Neural Nets is a big step imho.
> Once you have a learning machine that can solve simple problems. You can scale it up to solve very complex problems.

Nope. It's really easy to solve simple problems; it can sometimes even be done by brute-force.

That's what caused the initial optimism around AI, e.g. the 1950s notion that it would be an interesting summer project for a grad student.

Insights into computational complexity during the 1960s showed that scaling is actually the difficult part. After all, if brute-force were scalable then there'd be no reason to write any other software (even if a more efficient program were required, the brute-forcer could write it for us).

That's why the rapid progress on simple problems, e.g. using Eliza, SHRDLU, General Problem Solver, etc. hasn't been sustained, and why we can't just run those systems on a modern cluster and expect them to tackle realistic problems.