| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by XCSme 14 days ago

On my tests[0] it does a bit worse, and it's almost 2x expensive than Opus 4.7...

I was surprised to see that it failed a Data extraction test (it gets it right 2/3 times, but one time it randomly returns null for a value instead).

It makes sense a bit that it fails more Trivia/Domain-specific knowledge tasks (I think models are more and more trained towards agentic use-case than general intelligence).

[0]: https://aibenchy.com/compare/anthropic-claude-opus-4-7-mediu...

3 comments

XCSme 14 days ago

For some reason everything is 2x (2x cost, 2x avg response time, 2x reasoning and output tokens)...

Double-checking my test harness, but it's the first model that does this, so I doubt the issue is on my side...

EDIT: Harness seems correct, for straight coding tasks they perform identical: https://i.snipboard.io/5xbpzY.jpg

link

dwaltrip 14 days ago

Wait, doesn’t the blog post say the price is the same as 4.7?

> Claude Opus 4.8 is available everywhere today. Pricing for regular usage is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Pricing for fast mode is $10 per million input tokens and $50 per million output tokens.

Where do you see the 2x cost?

link

XCSme 14 days ago

The total cost of running my benchmarks, was 1.6x higher compared to Opus 4.7, mostly because of 2x output tokens:

https://i.snipboard.io/vrdwTa.jpg

link

dwaltrip 14 days ago

ah ok, thanks for clarifying!

link

spprashant 14 days ago

If it spends 2x tokens to achieve the same result, that's effective 2x cost in a manner of speaking

link

SupLockDef 14 days ago

Releasing a new model is the new way to Jack up the price hehe.

link

eshack94 14 days ago

That's exactly right.

link