Hacker News new | ask | show | jobs
by msp26 955 days ago
Yeah, degraded performance on long contexts has been observed in plenty of other models [https://arxiv.org/abs/2307.03172] so I was cautious too. Unfortunately I don't have access to 4-32k. I would have liked to test that out too.