Hacker News new | ask | show | jobs
by GaggiX 806 days ago
The paper shows that the speed is comparable to transformer models, faster with smaller with "long" sequence length like 8k.