|
|
|
|
|
by mochidusk
891 days ago
|
|
I struggled learning about Mamba's architecture but realized it's because I had some gaps in knowledge. In no particular order, they were: - a refresher on differential equations - legendre polynomials - state spaced models; you need to grok the essence of x' = Ax + Bu y = Cx - discretization of S4 - HiPPO matrix - GPU architecture (SRAM, HBM) Basically, transformers is an architecture that uses attention. Mamba is the same architecture that replaces attention with S4 - but this S4 is modified to overcome its shortcomings, allowing it to act like a CNN during training and an RNN during inference. I found this video very helpful: https://www.youtube.com/watch?v=8Q_tqwpTpVU His other videos are really good too. |
|