Y
Hacker News
new
|
ask
|
show
|
jobs
by
kseniamorph
86 days ago
wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output
1 comments
PunchTornado
86 days ago
what's the point of this benchmark if sonnet is working great at my tasks and mini can't solve my tasks?
link