| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trb 4 days ago

  L icon Grok 4.1 Fast won 13 of 30 games at $0.97 per win

  The next-best winner was A icon Claude Sonnet 4.6 with 5 wins, at $26.78 per win. That’s a 27x difference. The model that isn’t on most top-model lists beat the model that is, on the thing a routing customer actually cares about.

  The model with the most kills did not win

  H icon GPT 5.4 killed 38 agents across 30 games. More than anyone else. It came in second on the leaderboard with 2 wins.

If grok-4.1-fast was the top-winning model, and Claude 4.6 Sonnet the second, how did Gpt-5.4 come in second on the leaderboard? Which one is second, Claude 4.6 Sonnet or Gpt-5.4?

  There were 11 games between “best at killing” and “best at winning”.

What does that mean? How are there 11 games between "best a killing" and "best at winning"?

3 comments

wagwang 4 days ago

That's just how battle royale works.

link

arczyx 4 days ago

The one who win is the one who survive to the end. If there are 10 players and you kill 5 but then die immediately, you lose to the player who only kill 1 but become the last man standing.

link

verall 4 days ago

The idea is really neat and there's probably an answer here related to last standing vs kills vs "scoring" (some combination of the 2?) but the article is nearly incoherent because the author did not feel like proofreading their slop

link