| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by apetresc 318 days ago
	How on Earth does that market have Anthropic at 2%, in a dead heat with the likes of Meta? If the market was about yesterday rather than 5 months from now I think Claude would be pretty clearly the front runner. Why does the market so confidently think they’ll drop to dead last in the next little while?

7 comments

degrews 318 days ago

It's because those markets are based on the LLM Arena leaderboard (https://lmarena.ai/), where Claude has historically done poorly.

That eval has also become a lot less relevant (it's considered not very indicative of real-world performance), so it's unlikely Anthropic will prioritize optimizing for it in future models.

link

kmacdough 317 days ago

Anthropic has always been one of the best at not optimizing for stupid metrics. Rather, they spend significant energy researching weaknesses and building metrics around that. Google is also pretty on point IMO, but they can also afford to dedicate to these nonsense metrics as they are still good marketing.

Meanwhile Meta and Xai are behind the ball and largely marketing focused.

link

ttroyr 305 days ago

True. I'm surprised they are not based on e.g. OpenRouter usage or similar.

link

Buttons840 318 days ago

How is Claude doing on the benchmark that market is based on? Maybe not so good? Idk. Just because Claude is good for real world use doesn't mean it's winning the benchmark, but the benchmark is all that matters for the Polymarket.

link

tedk-42 317 days ago

I'm a fan of Anthropic for this reason. I use Claude and it's very good most of the time for my coding requirements.

Generally when you have a lot of companies competing to show whos product X does the best at Y, there's a lot of monetary incentives to manipulate the products to perform well specifically on those types of tests.

link

vasco 318 days ago

If you think it's wrong, participate. That's the only way prediction markets end up predicting anything.

link

Tadpole9181 318 days ago

Ah, yes, if you disagree you must participate in real money gambling based on the outcome of a single user-based, single-prompt leaderboard.

link

vasco 318 days ago

Well I for example don't give a shit what prediction markets do and never participated, but if someone thinks they're wrong, they should just participate and get free money. Otherwise why complain.

link

apetresc 317 days ago

I wasn't complaining per-se, I was asking for (and expecting) a legitimate reason. Which I got: that the market is resolved purely based on LLM Arena which Anthropic has never done well on (which says more about the benchmark than about Anthropic).

link

vasco 317 days ago

You got a random person saying a random thing. There's no explanation for a market. The same way the stock market doesn't move for the reason the articles say it does. Everyone on each side has their own multitude of reasons.

link

sinuhe69 318 days ago

I think they also based their expectation on the release cycles and speeds of update. Anthropic is known for more conservative release cycle and incremental updates. Google on the other hand is accelerated recently. It also seems that other actors are better at benchmark cheating ;)

link

epiccoleman 318 days ago

I find this confusing too. I dropped my OpenAI subs for Claude a while back and I don't feel like I'm missing much.

I need to spend some more time with Gemini too though. I was using that as a backend for Cursor for a while and had some good results there too.

link

manmal 318 days ago

Claude is a useful tool, IMO the most useful one even, but not a road to AGI.

link

globular-toast 317 days ago

I mean, if you feel strongly enough that it will be #1 at the end of year then $100 now would net you $3000 end of year... Do bear in mind what my sibling said about the specific benchmark that is being used, though.

link