| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by spwa4 546 days ago

So the point of this release is

1) code + weights Apache 2.0 licensed (enough to run locally, enough to train, not enough to reproduce this version)

2) Low latency, meaning 11ms per token (so ~90 tokens/sec on 4xH100)

3) Performance, according to mistral, somewhere between Qwen 2.5 32B and Llama 3.3 70B, roughly equal with GPT4o-mini

4) ollama run mistral-small (14G download) 9 tokens/sec on the question "who is the president of the US?" (also to enjoy that the answer ISN'T orange idiot)