Y
Hacker News
new
|
ask
|
show
|
jobs
by
moffkalast
720 days ago
The 4k sliding window context seems like a controversial choice after Mistral 7B mostly failed at showing any benefits from it. What was the rationale behind that instead of just going for full 8k or 16k?
1 comments
alekandreev
720 days ago
This is mostly about inference speed, while maintaining long context performance.
link