|
|
|
|
|
by spider-mario
15 days ago
|
|
> You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same. Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule. |
|
I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).