Hacker News new | ask | show | jobs
by _bin_ 482 days ago
i'm glad they seem to work better for you. i sometimes seem to be the only person out there who can't get the same level of utility out of these models as others.

my guess is for some applications they can, but even reasoning models (o3-mini-high, grok3, sonnet 3.7, o1, deepseek, etc.) often fail to fix logic bugs. note that this isn't necessarily a form validation logic bug I'm referring to but, say, a pretty in-the-weeds tool for cleaning and pre-processing data for ML purposes. my guess is basic business-logic-y type stuff is much more doable.

i haven't really found a good way around cases where it either just adds printfs or loops through the same, non-working fixes repeatedly. they keep getting better, they're just not yet below my epsilon for unreliability.

2 comments

You're not alone. Fixing bugs is usually really easy anyway, and it takes me more effort to feed the context to the LLM than to fix them myself.

The real complicated "bugs" often come from unclear requirements and the hard part is clearing up the requirements. It's more about design than logic errors in the code. And LLMs suck hard at this.

> even reasoning models [...] often fail to fix logic bugs.

I think "often" is the key word here. To be clear, they often fail for me to! But they also often work.

the problem is something that has gone from working 10% of the time to working 50% of the time still requires me to thoroughly review everything it does 100% of the time. hence my comment about "the intern problem".