|
|
|
|
|
by iandanforth
4261 days ago
|
|
I'm going to be stupid in public on the hope that someone will correct me. 1. I'm not clear on the point of this paper. There are a lot of buzzwords and an extremely diverse set of references. The heart of the paper seems to be a comparison between Long-Short-Term-Memory (LSTM) recurrent nets and their NTM nets. But they don't expose the network to very long sequences, or sequences broken by arbitrarily long delays which are what LSTM nets are particularly good at. They seem to make the jump from "LSTM nets are theoretically turing complete" to "LSTM nets are a good benchmark for any computational task." 2. The number of training examples seems huge For many of the tasks they trained over hundreds of thousands of sequences. This seems like very very slow learning. If I'm meant to interpret these results as a network learning a computational rule (copying, sorting etc) is it really that impressive if it takes 200k examples before it gets it right? (Not sarcasm, I really don't know.) |
|
Re: number of training examples, I'm taking the chart on pg 11 to mean the number of training examples shown. Based on that, it looks like the NTM is learning a lot faster than the LSTM. As far as I can tell, it's getting near 0 loss about 20,000 examples in? It depends on the domain for whether learning w/ 20k examples is impressive or not, personally I think it's comparatively impressive.
Re: cherry picking of tasks to highlight perceived strengths of NTM, fair enough. Although this is one I'll be playing around with a bit to find out where that starts and stops...
Any thoughts on how this compares to the approach of HTMs?