Hacker News new | ask | show | jobs
by Muller20 2100 days ago
What's holding back NTM is that they are hard to train, even worse than RNNs. They are not much less efficient than a Transformer. Instead, Transformer has all the advantages of the NTM but it is much easier to train.

Actually, the way I see it, Transformer is a direct descendent of memory-based architectures (NTM, MemNet, stack-based RNNs...) that is both expressive and easy to train.