Hacker News new | ask | show | jobs
by moffkalast 720 days ago
The 4k sliding window context seems like a controversial choice after Mistral 7B mostly failed at showing any benefits from it. What was the rationale behind that instead of just going for full 8k or 16k?
1 comments

This is mostly about inference speed, while maintaining long context performance.