Hacker News new | ask | show | jobs
by euclaise 793 days ago
This one does have attention, it's just chunked into segments of 4096
1 comments

Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.