Y
Hacker News
new
|
ask
|
show
|
jobs
by
woodson
70 days ago
Look into RWKV.
1 comments
JohannaAlmeida
70 days ago
Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention
link