Hacker News new | ask | show | jobs
by murbard2 2108 days ago
> There is absolutely no training signal that tells an RNN to remember something beyond the BPTT horizon. So why would it?

Because they generalize. Char-RNN learn to balance parenthesis separated by a longer distance than the BPTT window because they've learned that counting parenthesis is useful for prediction based on parenthetical statements shorter than the BPTT window.