Y
Hacker News
new
|
ask
|
show
|
jobs
by
cs702
793 days ago
Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.