| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Muller20 2147 days ago
	What's holding back NTM is that they are hard to train, even worse than RNNs. They are not much less efficient than a Transformer. Instead, Transformer has all the advantages of the NTM but it is much easier to train. Actually, the way I see it, Transformer is a direct descendent of memory-based architectures (NTM, MemNet, stack-based RNNs...) that is both expressive and easy to train.