|
|
|
|
|
by murbard2
2108 days ago
|
|
> There is absolutely no training signal that tells an RNN to remember something beyond the BPTT horizon. So why would it? Because they generalize. Char-RNN learn to balance parenthesis separated by a longer distance than the BPTT window because they've learned that counting parenthesis is useful for prediction based on parenthetical statements shorter than the BPTT window. |
|