Hacker News new | ask | show | jobs
by woodson 70 days ago
Look into RWKV.
1 comments

Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention