| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kseniamorph 134 days ago
	wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output

1 comments

what's the point of this benchmark if sonnet is working great at my tasks and mini can't solve my tasks?