| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jnovek 16 days ago
	I can’t tell the difference between code written in vim or vs code but it matters substantially to the person writing the code. There’s stuff beyond just the output that goes into tool choice.

5 comments

neosat 16 days ago

Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.

spider-mario 16 days ago

> You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output.

You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same.

Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule.

neosat 16 days ago

That's a fair callout and I agree my statement was too general in just mentioning 'output', as you correctly pointed out. To define 'better' you would indeed need to agree on the dimensions you would evaluate candidates against.

I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).

runako 16 days ago

This obviously correct take will get pushback, so let me add some other examples:

- which tool required more detailed goal-setting in the prompt?

- did one tool ask follow-up questions up front vs spread out over implementation?

- did either tool match existing coding styles?

- did either tool remind you about potential conflicts between what you asked it to build and other parts of the codebase?

There are a lot of ways to compare agents besides just the code. (Similarly, working engineers are not evaluated just on their code output.)

SiempreViernes 16 days ago

The colleague implicitly agreed that comparing the output was a valid way to settle the matter as they took part in the test, so they weren't using "better" in the way you propose.

spider-mario 16 days ago

I wasn’t really discussing the colleague, but either way, from:

> A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.

I don’t think it’s obvious that they specifically agreed that losing the game meant that. They might just have thought “sure, it might be fun”, if they even gave it that much thought.

“So we played a game” is rather vague and I feel it’s a bit of a leap to read it as: “as an explicit outcome of their claim that Claude is better, we made a formal bet as to whether they could tell the difference in the output, the failure of which would mean a full retractation of their statement”.

skillina 16 days ago

Claude and Codex are tools. You can't tell the difference in the output between something that was done with a ratcheting wrench vs a standard combination wrench, but your mechanic certainly knows the ratcheting wrench is better (for most tasks).

I've not used Codex to compare against, so I'm not claiming X is better than Y, but comparing tools simply on their output is naive.

bluegatty 16 days ago

" You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output"

Sorry I think this misses the mark.

Because it's not the output but the process.

And sometimes the outcomes are not always discernable.

Codex and Claude are very different.

I use them for different things.

Their behaviour difference is obvious.

Of course it'd impossible for anyone to tell by looking at my code base 'how it was written'.

neosat 16 days ago

You need to see the response in light of the original discussion. Referencing here for clarity since I should have included it in the first place: "We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code."

So the same person, was using similarly competitive tools, and showing that the output was hard to discern (indirectly the implication was also that implementation was fairly trivial in both of those). A better analogy would not be different process and widely different tools but for example two power drills. Sure, folks could still prefer one over the other, but that's a different claim that saying X is objectively better than Y when both are directly competing on very similar dimensions.

Assuming you meant Claude code: I'd love to learn more about "Codex and Claude are very different" because maybe I'm assuming just based on my use case where I use both of them interchangeably for the same thing (coding web and mobile apps)

bluegatty 16 days ago

It's not reasonable to compare results from two different tool sets, especially as they are guided by humans.

The only way a reasonable comparison could be made, would be to compare completely automated results from either technology - that would be useful.

For example - creating a 'per-baked script' and running on both to see the output.

Codex and Claude are obviously very different, though it's hard to characterize how those differences might apply exactly to a given problem.

Two 'very different power saws' will ultimately build the same home.

jnovek 16 days ago

> A colleague was convinced Claude is better

That’s actually what my comment was based on; raw code output isn’t the only measure of quality. Engineers write better code if they have the tools they prefer.

SiempreViernes 16 days ago

The colleague participated in the test though, so apparently the colleague didn't object to "better" being interpreted as "makes better output".

SiempreViernes 16 days ago

If you told someone "I think vim is better for writing code" and they proposed the comparison above as a way to prove it, would you accept and take part of the test?

Apparently the colleague did take part, so I think the evidence we have is that the colleague agreed with the interpretation that "better" was "produces discernible better code".

amazingamazing 16 days ago

> There’s stuff beyond just the output that goes into tool choice.

Yup, like billions of capex. Unlike vim.

grayhatter 16 days ago

I'd bet I could tell with a result somewhat better than random chance.

While there is no meaningful difference in the ability to write code, vim has earned it's reputation for having a learning curve. I'd argue that predisposition, that requirement for additional investment energy will bias the results towards attention to detail, and pure minimalism.

davidguetta 15 days ago

yeah but you dont pretend vim is better