Hacker News new | ask | show | jobs
by fennecbutt 245 days ago
Idk though, I've seen many issues occur because of a longer context though. I mean it makes sense, given there are only so many attention heads, the longer the context the less chance attention will pick relevant tokens.