| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by charleyslee 21 days ago
	tysm for posting this! i'm charley, cofounder of datacurve, we created this benchmark and my team and i are here to answer any q's.

2 comments

ammar_x 21 days ago

Absolutely! We need new and better benchmarks like this.

I have a question: why not use the maximum available reasoning on each LLM? For example, I see that Opus 4.7 at `max` reasoning but Sonnet 4.6 at `high`. Wouldn't it be a fairer comparison if all were at max?

link

davidshepherd7 21 days ago

Did you try Opus-4.7 on a lower reasoning level? Looks like on `max` it's using far more tokens than the other frontier models.

link