The Turing Machine idea makes a lot of sense... the machine is simply a state machine graph that interacts with the memory --- sensible that it could be "learned" similar to any genetic algorithm approach. Pretty cool trick regarding the differentiability of the system however.
That said, the biggest challenge here, I imagine, is evaluating the learned system. It may give right answers, but how often does it give wrong answers? How can the learned "machine" be tested for correctness? How does overfitting come into the picture? For instance, halting cannot be proved nor guaranteed. This strikes me as a fundamental advantage of a more functional "feed forward" approach of most learning systems.
The paper discusses putting the NTM through several tasks, and tests for "overfitting" or how well it has generalised the task by giving it a slightly longer task than it has seen during training. For example, in the copy task, they trained it on sequences of length 20, but tested the it on a sequence of length 100.
Of course, this doesn't guarantee anything, but they also take a look at some of the internals of the learnt system which are more easily interpreted, and found that it does some pretty consistent things.
I would like to add for those of you not familiar: The Neural Turing Machine method uses a neural network called a recurrent neural net. Recurrent neural nets are used in modeling time series data and have a neat concept of training called back propagation through time.
Here's a neat tutorial with an RBM (typically a feed forward net) as a recurrent net for those who want to just see what a recurrent net "looks like"
Nitpicking here, but while the authors do use a recurrent neural net (RNN), they do not use it exclusively.
The system consists of a memory element, and a controller element. In their evaluation of the system, they use both a standard feed-forward network, as well as an RNN with long short-term memory (LSTM) units as the controller element. In certain tasks, the feed-forward network works better.
+1 on the deeplearning.net tutorials, and theano. I've learnt a lot from there.
Right. Mainly just low hanging fruit for those who aren't in this stuff day to day.
In a lot of my talks and day to day conversations, I've found people don't know the difference between a feed forward architecture vs, recurrent, vs recursive vs,...you get the point :P
Originally I thought we could just change the present url to that one, but since the comments are only about the other paper, it seems better to just treat this as a dupe.
That said, the biggest challenge here, I imagine, is evaluating the learned system. It may give right answers, but how often does it give wrong answers? How can the learned "machine" be tested for correctness? How does overfitting come into the picture? For instance, halting cannot be proved nor guaranteed. This strikes me as a fundamental advantage of a more functional "feed forward" approach of most learning systems.