Hacker News new | ask | show | jobs
by aucisson_masque 29 days ago
There isn't even deepseek V4.

I'd rather trust LLM arena leaderboard, which puts it on par with sonnet.

1 comments

LM Arena uses human side by side voting, which limits its applicability to complex tasks.

The ARCPrize leaderboard does have Deepseek V3.2, which only scored 4% on ARC-AGI 2 (while the top models score over 80%). It also Kimi and Qwen, but they also didn't perform well.