| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ajcp 849 days ago
	Mistral 8x7b has can handle context of ~32,000 pretty comfortably and it benchmarks at or above GPT3.5

1 comments

Is that the sliding context window size? Because I didn't have good results with sliding context windows in the regular Mistral models.

Yeah, I think they fine-tune without a specific window size target to achieve and then keep expanding context until it starts falling over.