Hacker News new | ask | show | jobs
by tjansen 129 days ago
> 10 minute is not the limit for current models. I can have them work for hours on a problem.

Admittedly, I have never tried to run it that long. If 10 minutes are not enough, I check what it is doing and tell it to do what it needs to do differently, or what to look at, or offer to run it with debug logs. Recently, I have also had a case where Opus was working on an issue forever, fixing one issue and thereby introducing another, fix that, only for the original issue to disappear. Then I tried out Codex, and it fixed it at first sight. So changing models can certainly help.

But do you really get a good solution after running it for hours? To me, that sounds like it doesn't understand the issue completely.

1 comments

Sometimes it doesn't work or it will give up early, but considering these run when I'm not working it is not a big deal. When it does work I would say that it has figured out that hard part of the solution. I may have to do another prompt to clean it up a bit, but it got the hard work out of the way.

>or offer to run it with debug logs.

Enabling it to add its own debug logs and use a debugger can allow it to do these loops itself and understand where it's going wrong with its current approach.

That assumes that it can easily reproduce the issues. But it's not good at interacting with a complex UI like a human user.