| Makes me think once again about the similarity to Finite Impulse Response[1] filters (traditional LLMs) and Infinite Impulse Response[2] filters (recursive models). Not that it's a very good or original analogy. Anyway, with FIR you typically need many, many times the coefficients to get similar filter cutoff performance as a what few IIR coefficients can do. You can convert a IIR to a FIR using for example the window design method[3], where if you use a rectangular window function you essentially unroll the recursion but stop after some finite depth. Similarly it seems unrolling the TRM you end up with the traditional LLM architecture of many repeated attention+ff blocks, minus the global feedback part. And unlike a true IIR, the TRM does implement a finite cut-off, so in that sense is more like a traditional FIR/LLM than the structure suggest. So, would perhaps be interesting to compare the TRM network to a similarly unrolled version. Then again, maybe this is all mad ramblings from a sleep deprived mind. [1]: https://en.wikipedia.org/wiki/Finite_impulse_response [2]: https://en.wikipedia.org/wiki/Infinite_impulse_response [3]: https://en.wikipedia.org/wiki/Finite_impulse_response#Window... |
>We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but has the notable advantage that we can analytically backpropagate through the equilibrium point using implicit differentiation.
https://arxiv.org/abs/1909.01377
What's fascinating about deep equilibrium models is that you only need a single layer to be equivalent to a conventional deep neural network with multiple layers. Recursion is all you need! The model automatically uses more iterations for difficult tasks and fewer iterations for easy tasks.