|
|
|
|
|
by spwa4
499 days ago
|
|
So the point of this release is 1) code + weights Apache 2.0 licensed (enough to run locally, enough to train, not enough to reproduce this version) 2) Low latency, meaning 11ms per token (so ~90 tokens/sec on 4xH100) 3) Performance, according to mistral, somewhere between Qwen 2.5 32B and Llama 3.3 70B, roughly equal with GPT4o-mini 4) ollama run mistral-small (14G download) 9 tokens/sec on the question "who is the president of the US?" (also to enjoy that the answer ISN'T orange idiot) |
|