Hacker News new | ask | show | jobs
by ogogmad 843 days ago
That might explain the motivation for why the Δ variable is used and varied; but not the "Selectivity", which the article says is expressed by how the matrices B and C vary while consuming input.

Something I've noticed is that B, C and Δ depend only on the current token. See this: https://www.kolaayonrinde.com/blog/images/mamba/ssm_algorith... -- Another thing is that I've noticed that the definition of "SSM" in the image I've linked to is apparently recursive. This is also in the Arxiv paper. Strange.

+1 though for making me go back to the article and read it more carefully! +1 also to the article.

1 comments

OK, I've noticed that the pseudo-code above is vectorised, and so there's no recursion. The SSM function is actually described at the start of the paper, and an efficient hardware-aware implementation is suggested in section 3: https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf