|
|
|
|
|
by kestiny
32 days ago
|
|
A good harness should not only make agents more capable at completing tasks, but also make their outputs much easier to review.
For example: A good harness constrains the action surface, context, and task boundaries.
An agent’s failure isn’t always due to “writing incorrect code” — it can also result from “doing things it wasn’t supposed to do.”
Tests and lints can verify part of the correctness, but they often fail to validate task scope.
A well-designed harness should shift the review process from “reading the entire diff” to “verifying whether the changes stay within the defined task boundaries.” |
|