Hacker News new | ask | show | jobs
by charleyslee 21 days ago
tysm for posting this! i'm charley, cofounder of datacurve, we created this benchmark and my team and i are here to answer any q's.
2 comments

Absolutely! We need new and better benchmarks like this.

I have a question: why not use the maximum available reasoning on each LLM? For example, I see that Opus 4.7 at `max` reasoning but Sonnet 4.6 at `high`. Wouldn't it be a fairer comparison if all were at max?

Did you try Opus-4.7 on a lower reasoning level? Looks like on `max` it's using far more tokens than the other frontier models.