Hacker News new | ask | show | jobs
by eli_gottlieb 602 days ago
>Our key insight is that the diagonal linear recurrent layer can act as a gradient accumulator

So they're sort of reinventing the discrete-time differentiator from signal processing, but parameterized neurally?

1 comments

Converging slowly on Kalman filters, calling it now.