| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pama 843 days ago
	A dynamic gate is a pretty distinct feature from previous SSM architectures in my opinion. In a sense, the overall fundamental architecture of mamba is still that of the transformer but with attention replaced by an SSM with dynamic gating. All of deep learning uses closely related ideas, but the SSM class of models took advantage of stability guarantees from integrators in control theory and created a class of RNN that don’t have to worry about exploding gradients. Mamba is one of the ways to make these SSM models much more expressive.

1 comments

Straw 840 days ago

Its distinct, but not very- its an EMA without assuming uniform time. The stability of EMA has nothing to do with integrators in control theory and neither do these models.

These models aren't really RNNs- they have only a linear gate which cannot depend on previous tokens at this layer, so they cant update their state in a way which depends on the current state very much.

link