Hacker News new | ask | show | jobs
by phillipcarter 956 days ago
Interesting. I was skeptical about some of their claims regarding longer context, since it's been my experience that these models just get lost after enough of it.
1 comments

Yeah, degraded performance on long contexts has been observed in plenty of other models [https://arxiv.org/abs/2307.03172] so I was cautious too. Unfortunately I don't have access to 4-32k. I would have liked to test that out too.