Hacker News new | ask | show | jobs
by ojo-rojo 100 days ago
How about a subsequent review where a separate agent analyzes the original issue and resultant code and approves it if the code meets the intent of the issue. The principle being to keep an eye out for manual work that you can describe well enough to offload.

Depending on your success rate with agents, you can have one that validates multiple criteria or separate agents for different review criteria.

2 comments

You are fighting nondeterministic behavior with more nondeterministic behavior, or in other words, fighting probability with probability. That doesn't necessarily make things any better.
In my experience, an agent with "fresh eyes", i.e., without the context of being told what to write and writing it, does have a different perspective and is able to be more critical. Chatbots tend to take the entire previous conversational history as a sort of canonical truth, so removing it seems to get rid of any bias the agent has towards the decisions that were made while writing the code.

I know I'm psychologizing the agent. I can't explain it in a different way.

I think of it as they are additive biased. ie "dont think about the pink elephant ". Not only does this not help llms avoid pink elphants instead it guarantees that pink elephant information is now being considered in its inference when it was not before.

I fear thinking about problem solving in this manner to make llms work is damaging to critical thinking skills.

Fresh eyes, some contexts and another LLM.

The problem is information fatigue from all the agents+code itself.

Aren't human coders also nondeterministic?

Assigning different agents to have different focuses has worked for me. Especially when you task a code reviewer agent with the goal of critically examining the code. The results will normally be much better than asking the coder agent who will assure you it's "fully tested and production ready"

Human coders are far more reliable. The only downside is speed, and therefore cost
Probably true

(Sorry.)

Slop on slop. Who watches rhe watchman?