Y
Hacker News
new
|
ask
|
show
|
jobs
by
fennecbutt
245 days ago
Idk though, I've seen many issues occur because of a longer context though. I mean it makes sense, given there are only so many attention heads, the longer the context the less chance attention will pick relevant tokens.