Hacker News new | ask | show | jobs
by sebzim4500 1184 days ago
The time and space complexity of inference is constant wrt. context size. You will probably need more parameters to match the performance of a transformer though, so whether it scales better in practice is an open question.