That might be your experience. I also prefer Claude for my tasks, but for general usage they are very close.
Leaderboards like LLM arena show this and effectively rank all latest models within 20-30 points, which is almost a coin flip. 30 point difference in Elo rating is ~55%/45%, so out of 11 answers, you might prefer 6 from best model, and 5 from worst.
It's crazy how different my personal experience is compared to LLM Arena. Very curious what the use cases people are doing that aren't overlapping with mine.
Leaderboards like LLM arena show this and effectively rank all latest models within 20-30 points, which is almost a coin flip. 30 point difference in Elo rating is ~55%/45%, so out of 11 answers, you might prefer 6 from best model, and 5 from worst.