Hacker News new | ask | show | jobs
by JohannaAlmeida 70 days ago
Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention