| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by epups 700 days ago
	The graphs seem to indicate their model trades blows with Llama 3.1 405B, which has more than 3x the number of tokens and (presumably) a much bigger compute budget. It's kind of baffling if this is confirmed. Apparently Llama 3.1 relied on artificial data, would be very curious about the type of data that Mistral uses.