|
|
|
|
|
by kla-s
474 days ago
|
|
Actually its a little more nuanced: Operation Type|Mamba Complexity|Transformer Complexity Training(per iteration)|O(L)|O(L^2) Autoregressive Inference(per step)|O(T)|O(L) Memory Requirements|O(C)|O(L) Where: L stands for the sequence length. T denotes a fixed constant that accounts for compression and selection time in Mamba's autoregressive inference. C reflects the fixed size of the SSM (State Space Model) latent state in Mamba Per: https://github.com/state-spaces/mamba/issues/196 |
|