| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by est31 99 days ago

Have you tried the latest models at best settings?

I've been writing software for 20 years. Rust since 10 years. I don't consider myself to be a median coder, but quite above average.

Since the last 2 years or so, I've been trying out changes with AI models every couple months or so, and they have been consistently disappointing. Sure, upon edits and many prompts I could get something useful out of it but often I would have spent the same amount of time or more than I would have spent manually coding.

So yes, while I love technology, I'd been an LLM skeptic for a long time, and for good reason, the models just hadn't been good. While many of my colleagues used AI, I didn't see the appeal of it. It would take more time and I would still have to think just as much, while it be making so many mistakes everywhere and I would have to constantly ask it to correct things.

Now 5 months or so ago, this changed as the models actually figured it out. The February releases of the models sealed things for me.

The models are still making mistakes, but their number and severity is lower, and the output would fit the specific coding patterns in that file or area. It wouldn't import a random library but use the one that was already imported. If I asked it to not do something, it would follow (earlier iterations just ignored me, it was frustrating).

At least for the software development areas I'm touching (writing databases in Rust), LLMs turned into a genuinely useful tool where I now am able to use the fundamental advantages that the technology offers, i.e. write 500 lines of code in 10 minutes, reducing something that would have taken me two to three days before to half a day (as of course I still need to review it and fix mistakes/wrong choices the tool made).

Of course this doesn't mean that I am now 6x faster at all coding tasks, because sometimes I need to figure out the best design or such, but

I am talking about Opus 4.6 and Codex 5.3 here, at high+ effort settings, and not about the tab auto completion or the quick edit features of the IDEs, but the agentic feature where the IDE can actually spend some effort into thinking what I, the user, meant with my less specific prompt.

2 comments

zozbot234 98 days ago

> I am talking about Opus 4.6 and Codex 5.3 here, at high+ effort settings

So you have to burn tokens at the highest available settings to even have a chance of ending up with code that's not completely terrible (and then only in very specific domains), but of course you then have to review it all and fix all the mistakes it made. So where's the gain exactly? The proper goal is for those 500 lines to be almost always truly comparable to what a human would've written, and not turn into an unmaintainable mess. And AI's aren't there yet.

link

als0 98 days ago

You really do need to try the latest ones. You can’t extrapolate from your previous experiences.

link

BoredomIsFun 98 days ago

I do not think they are impartial - all I can see is lots of angst.

link

notpachet 98 days ago

I feel like we're talking about different things. You seem to be describing a mode of working that produces output that's good enough to warrant the token cost. That's fine, and I have use cases where I do the same. My gripe was with the parent poster's quote:

> Claude and GPT regularly write programs that are way better than what I would’ve written

What you're describing doesn't sound "way better" than what you would have written by hand, except possibly in terms of the speed that it was written.

link

est31 98 days ago

yeah it writing stuff that's way better than mine is not the case for me, at least for areas I'm familiar with. In areas I'm not familiar with, it's way better than what I could have produced.

link