| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ammar_x 22 days ago
	Absolutely! We need new and better benchmarks like this. I have a question: why not use the maximum available reasoning on each LLM? For example, I see that Opus 4.7 at `max` reasoning but Sonnet 4.6 at `high`. Wouldn't it be a fairer comparison if all were at max?