|
|
|
|
|
by Eridrus
3323 days ago
|
|
One logical continuation of adding more attention steps is to make decision of how many attention steps to take determined by the network ala "Adaptive Computation Time for Recurrent Neural Networks", are you planning to go in that direction? |
|
[1]: https://arxiv.org/abs/1610.07647