| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lunar_mycroft 22 days ago
	> For coding you always want to go with the best model in the category This is transparently false, because the best "model" is still competent human developers. They're just more expensive. If you're willing to use current LLMs at all, it means you're willing to sacrifice quality for a better price, and your disagreement with the comment you were replying to is entirely about what the optimum tradeoff is.

2 comments

aspenmartin 22 days ago

Well it may be false that you always want the best model, but the point is performance of you+<agent> is far more cost effective than you+someone else

link

lunar_mycroft 21 days ago

Maybe, but that's a different claim than the one I was responding to. And also raises the question of "if the lower quality but cheaper output of frontier models is more cost effective than humans, is the even lower quality but even cheaper output of OSS models is more cost effective still?" With an absolute rule like GP suggested ("no, you always want the best code generator") the answer is clear, but it get much murkier if you reject such rules (as you have to to be an LLM coding proponent)

link

aspenmartin 21 days ago

I think that’s a fair and good q and point.

link

noname120 21 days ago

It was true 6 months ago, not anymore. Frontier models now outperform developers on many tasks, be it on quality/readability/maintainability, and let’s not talk about speed…

link

lunar_mycroft 21 days ago

I've seen the code they produce without extensive help from human developers, this is clearly false.

Good to see the classic "yeah the models weren't good enough six months ago, but this time they actually are, promise! Please forget you were hearing the exact same thing six months ago!" is alive and well though.

link

aspenmartin 21 days ago

Are you aware of performance trends though? You’re painting a picture that seems to ignore how things have consistently trended for many years now, even pre ChatGPT. It is absolutely data driven to say “an inflection point has happened within the last 6 months”. And that was also true 6 months ago (where people started using coding agents fairly consistently since sonnet 4). And it was true 6 months before that. It’s not like people are like “we’ve fixed all the bugs!” And then nothing has changed. I don’t necessarily agree with the parent poster that agents are better than humans but they are certainly much better at many tasks.

link

lunar_mycroft 21 days ago

> Are you aware of performance trends though? You’re painting a picture that seems to ignore how things have consistently trended for many years now, even pre ChatGPT.

Models have been getting better, but all that follows from that is that newer models tend to be better than older ones. It doesn't follow that they have (or even will in the future) gotten better than anything else, be that human developers, a given definition of good enough, etc.

> It is absolutely data driven to say “an inflection point has happened within the last 6 months”.

With all due respect to OP (who I think is responsible for popularizing that way of phrasing it), I don't think it is when you consider the actual definition of "inflection point". At best I think you can say that models crossed a lot of developers definition of good enough around then, which is a different thing. The problem I have with that is that as a (mostly) outsider looking in, it doesn't seem like they're right.

link

aspenmartin 21 days ago

> Models have been getting better, but all that follows from that is that newer models tend to be better than older ones. It doesn't follow that they have (or even will in the future) gotten better than anything else, be that human developers, a given definition of good enough, etc.

But this is not true, you’re saying we only have relative performance numbers and not absolute measures of capabilities and reliability but that’s simply not true. OSS benchmarks as well as the internal flywheels of these companies are good complementary measurements.

> At best I think you can say that models crossed a lot of developers definition of good enough around then, which is a different thing

That’s the inflection point. Implication is a massive jump in adoption. We’re not like pulling this out of a hat, there are a number of compelling datapoints. The onus is on people to bring actual evidence that contradicts all of the data and observations we have.

link

lunar_mycroft 21 days ago

> you’re saying we only have relative performance numbers and not absolute measures of capabilities and reliability but that’s simply not true.

No, I'm saying that the claim you were making ("current models are better than some non-model based standard X") does not follow from your premise ("current models are better than past models"). It's possible that your claim is still true (although I don't think it is for most of the values of X that matter), but that wouldn't change the fact that the argument made is invalid.

As stated, your argument was basically the classic "my 3-month-old is now twice the size he was when he was born" meme, except if the tweet claimed that the kid currently out weighed an elephant.

> That’s the inflection point.

No, it isn't. An inflection point is when the direction of curvature changes. If we crossed over into the diminishing returns part of the logistic function, that would be an inflection point (as would the case where we had been in the diminishing returns regime, but then progress went back to speeding up).

> Implication is a massive jump in adoption.

The point I made was that "a massive jump in adoption" doesn't actually imply "the models are actually good enough now", only that a lot more people think they are.

link

suddenlybananas 21 days ago

Why is anthropic hiring software developers then?

link

aspenmartin 21 days ago

Because they still need them?

link

suddenlybananas 21 days ago

Why would they still need them if "[f]rontier models now outperform developers on many tasks, be it on quality/readability/maintainability, and let’s not talk about speed"

link

aspenmartin 21 days ago

Because to replace a SWE you need them to reliably outperform developers on ALL tasks

link

suddenlybananas 21 days ago

But anthropic already had plenty of developers. Why would they actively need to hire more if the workload is all being automated?

link

hajile 21 days ago

That's some really fast goalpost moving.

If AI could outperform humans, Anthropic would NEVER release that model. Instead, they'd use it to create a new google, photoshop, office, windows, etc for cheap then undercut all those companies and taking over the entire software industry.

link