|
|
|
|
|
by arnaudsm
202 days ago
|
|
Geometric mean of MMMLU + GPQA-Diamond + SimpleQA + LiveCodeBench : - Gemini 3.0 Pro : 84.8 - DeepSeek 3.2 : 83.6 - GPT-5.1 : 69.2 - Claude Opus 4.5 : 67.4 - Kimi-K2 (1.2T) : 42.0 - Mistral Large 3 (675B) : 41.9 - Deepseek-3.1 (670B) : 39.7 The 14B 8B & 3B models are SOTA though, and do not have chinese censorship like Qwen3. |
|