Y
Hacker News
new
|
ask
|
show
|
jobs
by
thomasahle
774 days ago
Transformers and SSMs can't do long computations that are inherently sequential.
Unless you give them chain of thought. In which case they do great.