| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sjkoelle 916 days ago

my loose understanding

1) transformers create an input x input size attention matrix that is unnecessarily large. state space models somehow compress this.

2) "The main difference is simply making several parameters [in the state space model] functions of the input"

3) i think it might be more sample efficient (requires less data)