|
|
|
|
|
by epups
700 days ago
|
|
The graphs seem to indicate their model trades blows with Llama 3.1 405B, which has more than 3x the number of tokens and (presumably) a much bigger compute budget. It's kind of baffling if this is confirmed. Apparently Llama 3.1 relied on artificial data, would be very curious about the type of data that Mistral uses. |
|