|
|
|
|
|
by sjkoelle
916 days ago
|
|
my loose understanding 1) transformers create an input x input size attention matrix that is unnecessarily large. state space models somehow compress this. 2) "The main difference is simply making several parameters [in the state space model] functions of the input" 3) i think it might be more sample efficient (requires less data) |
|