| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rvnx 282 days ago
	Claude Opus 4.1 is way above the others in terms of quality of the answers (especially for programming)

2 comments

elAhmo 282 days ago

That might be your experience. I also prefer Claude for my tasks, but for general usage they are very close.

Leaderboards like LLM arena show this and effectively rank all latest models within 20-30 points, which is almost a coin flip. 30 point difference in Elo rating is ~55%/45%, so out of 11 answers, you might prefer 6 from best model, and 5 from worst.

link

jasonjmcghee 282 days ago

It's crazy how different my personal experience is compared to LLM Arena. Very curious what the use cases people are doing that aren't overlapping with mine.

link

croes 282 days ago

I play code ping pong between multiple AIs to get some decent code. They all fail at some point

link