Hacker News new | ask | show | jobs
by iandanforth 4261 days ago
I'm going to be stupid in public on the hope that someone will correct me.

1. I'm not clear on the point of this paper.

There are a lot of buzzwords and an extremely diverse set of references. The heart of the paper seems to be a comparison between Long-Short-Term-Memory (LSTM) recurrent nets and their NTM nets. But they don't expose the network to very long sequences, or sequences broken by arbitrarily long delays which are what LSTM nets are particularly good at. They seem to make the jump from "LSTM nets are theoretically turing complete" to "LSTM nets are a good benchmark for any computational task."

2. The number of training examples seems huge

For many of the tasks they trained over hundreds of thousands of sequences. This seems like very very slow learning. If I'm meant to interpret these results as a network learning a computational rule (copying, sorting etc) is it really that impressive if it takes 200k examples before it gets it right? (Not sarcasm, I really don't know.)

2 comments

Re: point of the paper, I think it's addressing a current need within representation learning research where there's this question of "Ok, we can teach really large neural networks stuff, but how do we compress that knowledge efficiently?" How can we learn more compact/efficient/reliable/discrete representations? I've only just finished reading it through and this seems to me to be a promising direction and one I'd like to see more research on.

Re: number of training examples, I'm taking the chart on pg 11 to mean the number of training examples shown. Based on that, it looks like the NTM is learning a lot faster than the LSTM. As far as I can tell, it's getting near 0 loss about 20,000 examples in? It depends on the domain for whether learning w/ 20k examples is impressive or not, personally I think it's comparatively impressive.

Re: cherry picking of tasks to highlight perceived strengths of NTM, fair enough. Although this is one I'll be playing around with a bit to find out where that starts and stops...

Any thoughts on how this compares to the approach of HTMs?

I think your criticisms are mostly misplaced.

- Re: "buzzwords...references": I don't see any buzzwords, in fact the word "deep" doesn't even appear in the text. Regarding references, A typical conference paper references cites a bunch of related papers written by people who might be reviewing it. This paper, on the other hand, cites some seminal work from other fields, which is more interesting and enriching for most readers.

- Re: point of the paper. How to design a learning computer that can access a long-term memory storage of large capacity, which can be optimized by gradient descent. (I.e., everything is differentiable.)

- Re: number of training examples is huge. Training neural networks often takes a huge number of iterations, and the problems considered in the paper are numerically challenging so the iteration count is not surprising. Also, just like the regular Turing machine, the "neural Turing machine" isn't the most efficient architecture, but it's conceptually the simplest one that has the desired properties.