Hacker News new | ask | show | jobs
by cs702 793 days ago
Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.