| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by miven 961 days ago

>a major feature of transformers being wildly faster inference than with LSTM

Wasn't the main issue with RNNs the fact that inference during training can't be efficiently parallelized?

The inference itself normally should be faster for an RNN than for a transformer since the former works in linear time in terms of input size while the latter is quadratic

1 comments

visarga 961 days ago

Mamba has dual view - you can use it both as CNN and RNN. The first is used for pre-training and for preloading the prompt because it can process all tokens at once. The second is used for token generation because it is O(1) per token. Basically two models in one, inheriting both advantages. This is possible because the Structured State Space layer is linear, so you can reshape some sums and unroll recursion into a convolution the size of the input, which can be further sped up with FFT.

link

inkysigma 961 days ago

As a quick point of clarification, I don't think MAMBA has a convolutional view since it drops the time invariance and is strictly linear. The authors use parallelized prefix sum to achieve some good speed up.

link

jimmySixDOF 961 days ago

and which is why the speed up is proportional to context length so starting near parity then, theoretically, see 100x at 100k tokens

link