|
|
|
|
|
by Muller20
2100 days ago
|
|
What's holding back NTM is that they are hard to train, even worse than RNNs. They are not much less efficient than a Transformer. Instead, Transformer has all the advantages of the NTM but it is much easier to train. Actually, the way I see it, Transformer is a direct descendent of memory-based architectures (NTM, MemNet, stack-based RNNs...) that is both expressive and easy to train. |
|