Hacker News new | ask | show | jobs
by ariwilson 398 days ago
Is there value in adding an overseer LLM that measures the progress between n steps and if it's too low stops and calls out to a human?
4 comments

I don't think you need an overseer for this, you can just have the agent self-assess at each step whether it's making material progress or if it's caught in a loop, and if it's caught in a loop to pause and emit a prompt for help from a human. This would probably require a bit of tuning, and the agents need to be setup with a blocking "ask for help" function, but it's totally doable.
And how does it effectively measure progress?
It can behave just like a senior role would - produce the set of steps for the junior to follow, and assess if the junior appears stuck at any particular step.
I have actually had great success with agentic coding by sitting down with a LLM to tell it what I'm trying to build and have it be socratic with me, really trying to ask as many questions as it can think of to help tease out my requirements. While it's doing this, it's updating the project readme to outline this vision and create a "planned work" section that is basically a roadmap for an agent to follow.

Once I'm happy that the readme accurately reflects what I want to build and all the architectural/technical/usage challenges have been addressed, I let the agent rip, instructing it to build one thing at a time, then typecheck, lint and test the code to ensure correctness, fixing any errors it finds (and re-running automated checks) before moving on to the next task. Given this workflow I've built complex software using agents with basically no intervention needed, with the exception of rare cases where its testing strategy is flakey in a way that makes it hard to get the tests passing.

>I have actually had great success with agentic coding by sitting down with a LLM to tell it what I'm trying to build and have it be socratic with me, really trying to ask as many questions as it can think of to help tease out my requirements.

Just curious, could you expand on the precise tools or way you do this?

For example, do you use the same well-crafted prompt in Claude or Gemini and use their in-house document curation features, or do you use a file in VS Code with Copilot Chat and just say "assist me in writing the requirements for this project in my README, ask questions, perform a socratic discussion with me, build a roadmap"?

You said you had 'great success' and I've found AI to be somewhat underwhelming at times, and I've been wondering if it's because of my choice of models, my very simple prompt engineering, or if my inputs are just insufficient/too complex.

I use Aider with a very tuned STYLEGUIDE.md and AI rules document that basically outlines this whole process so I don't have to instruct it every time. My preferred model is Gemini 2.5 Pro, which is definitely by far the best model for this sort of thing (Claude can one shot some stuff about as well but for following an engineering process and responding to test errors, it's vastly inferior)
How do you find Aider compares to Claude code?
Producing the set of steps is the hard part. If you can do that, you don’t need a junior to follow it, you have a program to execute.
It is a task that LLMs are quite good at.
If the LLM actually could generate good steps that helped make forward progress then there would be no problem at all making agents, but agents are really bad so LLM can't be good at that.

If you feel those tips are good then you are just a bad judge of tips, there is a reason self help books sell so well even though they don't really help anyone, their goal is to write a lot of tips that sound good since they are kind of vague and general but doesn't really help the reader.

I use agentic LLMs every single day and get tremendous value. Asking the LLM to produce a set of bite-sized tasks with built-in corrective reminders is something that they're really good at. It gives good results.

I'm sorry if you're using it wrong.

If this is true then we wouldn't have senior engineers that delegate. My suggestion is to think a couple more cycles before hitting that reply button. It'll save us all from reading obviously and confidently wrong statements.
AI aren't real people… You do that with real people because you can't just rsync their knowledge.

Only on this website of completely reality detached individuals such an obvious comment would be needed.

So...you don't think you can give LLMs more knowledge ?? You're the one operating in detached reality. The reality is that a ton of engineers are finding LLMs useful, such as the author.

Maybe consider if you don't find it useful you're working on problems that it's not good at, or even more likely, you just suck at using the tools.

Anybody that finds value out of LLMs has a hard time understanding how one would conclude they are useless and you can't "give it instructions because that's that hard part" but it's actually really easy to understand. The folks that think this are just bad at it. We aren't living in some detached reality. The reality is that some people are just better than others

Senior engineers delegate in part because they're coaxed into a faux-management role (all of the responsibilities, none of the privileges). Coding is done by juniors; by the time anyone gains enough experience to finally begin to know what they're doing, they're relegated to "mentoring" and "training" new cohort of fresh juniors.

Explains a lot about software quality these days.

Or you know, they are leading big initiatives and cant do it all by themselves. Seniors can also delegate to other seniors. I am beyond senior with 11YOE and still code on a ton of my initiatives.
Bruh, we're inventing robot PMs for our robot developers now? We're so fucked
Yes it works really well. We do something like that at NonBioS.ai - longer post below. The agent self reflects if it is stuck or confused and calls out the human for help.