|
|
|
|
|
by ogogmad
843 days ago
|
|
That might explain the motivation for why the Δ variable is used and varied; but not the "Selectivity", which the article says is expressed by how the matrices B and C vary while consuming input. Something I've noticed is that B, C and Δ depend only on the current token. See this: https://www.kolaayonrinde.com/blog/images/mamba/ssm_algorith... -- Another thing is that I've noticed that the definition of "SSM" in the image I've linked to is apparently recursive. This is also in the Arxiv paper. Strange. +1 though for making me go back to the article and read it more carefully! +1 also to the article. |
|