|
|
|
|
|
by nextos
132 days ago
|
|
Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers. |
|