Y
Hacker News
new
|
ask
|
show
|
jobs
by
euclaise
793 days ago
This one does have attention, it's just chunked into segments of 4096
1 comments
cs702
793 days ago
Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.
link