Hacker News new | ask | show | jobs
by thomasahle 774 days ago
Transformers and SSMs can't do long computations that are inherently sequential.

Unless you give them chain of thought. In which case they do great.