Hacker News new | ask | show | jobs
by a1j9o94 100 days ago
This is an interesting space. Right now we've gotten to a point where agents can do most tasks, but they will get lazy/skip steps if you're not precise in the requirements. We need ways to validate that expands beyond software tests. This is a good direction but a few thoughts: 1. From what I can tell the agent who does the task is running the validation. Keeping the validation agent as a separate context avoids the validator knowing what the software is supposed to do vs what it does 2. There's a lot of prior art around org structures to validate things that we've built out over the last ~100 years that we can apply in this space. E.g. look at the way that blind trials are run
1 comments

totally agree, and fwiw, nothing in this implementation requires that the agent verifies it themselves. the hope is something that ultimately exists as a verification mechanism on one side of an agent-to-agent interaction/delegation.

because it is true that, even though we've got some adversarial aspects built into the verification, that's not truly blind from the actor (unless you explicitly design the use of these in that way, which is what I've considered as the better design)