Hacker News new | ask | show | jobs
by ajcp 849 days ago
Mistral 8x7b has can handle context of ~32,000 pretty comfortably and it benchmarks at or above GPT3.5
1 comments

Is that the sliding context window size? Because I didn't have good results with sliding context windows in the regular Mistral models.
Yeah, I think they fine-tune without a specific window size target to achieve and then keep expanding context until it starts falling over.