|
|
|
|
|
by gpt5
38 days ago
|
|
LM Arena uses human side by side voting, which limits its applicability to complex tasks. The ARCPrize leaderboard does have Deepseek V3.2, which only scored 4% on ARC-AGI 2 (while the top models score over 80%). It also Kimi and Qwen, but they also didn't perform well. |
|