Y
Hacker News
new
|
ask
|
show
|
jobs
by
jeremycochoy
877 days ago
Well, RWKV is using some linear (non-quadratic) form of attention so... strictly speaking... you still need a bit of attention :D