Hacker News new | ask | show | jobs
by jeremycochoy 877 days ago
Well, RWKV is using some linear (non-quadratic) form of attention so... strictly speaking... you still need a bit of attention :D