Hacker News new | ask | show | jobs
by imtringued 260 days ago
Deep Equilibrium Models

>We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but has the notable advantage that we can analytically backpropagate through the equilibrium point using implicit differentiation.

https://arxiv.org/abs/1909.01377

What's fascinating about deep equilibrium models is that you only need a single layer to be equivalent to a conventional deep neural network with multiple layers. Recursion is all you need! The model automatically uses more iterations for difficult tasks and fewer iterations for easy tasks.

1 comments

Thanks something like that was going through my mind, nice to get a good reference for it. Any insights on why this is not a more popular approach? Maybe it's too difficult for a single layer to scale.

I read a paper recently on something similar for diffusion, called Fixed Point Diffusion Models. They specialize the first and last layers but recurse the middle layer some number of times until convergence.

Considering how a Transformer is a residual model, each layer must be adding more and more precise adjustments to the selected token. It makes a lot of sense to think of this like the steps of an optimisation method.