| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pattt 80 days ago
	Do we have any solid evidence these models can outperform Western models in terms of quality? Or is it more: because they are forbidden, they can't get enough training data, visibility etc. to compete?

1 comments

gpt5 80 days ago

Scroll down to the leaderboard - https://arcprize.org/leaderboard

Spoiler alert - they are all towards the bottom of the leaderboard. People come up with a wide variety of excuses for why they are not used despite being offered for significantly lower cost, but the answer is simply because they don't perform well enough for now.

link

aucisson_masque 79 days ago

There isn't even deepseek V4.

I'd rather trust LLM arena leaderboard, which puts it on par with sonnet.

link

gpt5 79 days ago

LM Arena uses human side by side voting, which limits its applicability to complex tasks.

The ARCPrize leaderboard does have Deepseek V3.2, which only scored 4% on ARC-AGI 2 (while the top models score over 80%). It also Kimi and Qwen, but they also didn't perform well.

link