Hacker News new | ask | show | jobs
by degrews 310 days ago
It's because those markets are based on the LLM Arena leaderboard (https://lmarena.ai/), where Claude has historically done poorly.

That eval has also become a lot less relevant (it's considered not very indicative of real-world performance), so it's unlikely Anthropic will prioritize optimizing for it in future models.

2 comments

Anthropic has always been one of the best at not optimizing for stupid metrics. Rather, they spend significant energy researching weaknesses and building metrics around that. Google is also pretty on point IMO, but they can also afford to dedicate to these nonsense metrics as they are still good marketing.

Meanwhile Meta and Xai are behind the ball and largely marketing focused.

True. I'm surprised they are not based on e.g. OpenRouter usage or similar.