Hacker News new | ask | show | jobs
by WithinReason 768 days ago
Can you expand on the "cannot solve fundamentally" part?
2 comments

So does anything do proper state tracking? And don’t point to the OP since very often purportedly better new architectures end up being basically vaporware (like mamba or rkwv, which still don’t have good quality pre trained models yet)
How do you mean vaporware?

Surely whether a big model using a certain system exists is only a matter of the choices of those with sufficient resources to train it. That's only a matter of their beliefs, not about actual model performance.

Transformers and SSMs can't do long computations that are inherently sequential.

Unless you give them chain of thought. In which case they do great.