|
|
|
|
|
by bisonbear
97 days ago
|
|
I'm becoming convinced that test pass rate is not a great indicator of model quality - instead we have to look at agent behavior beyond the test gate, such as how aligned is it with human intent, and does it follow the repo's coding standards. I wrote a short blog about this phenomenon here if you're interested https://www.stet.sh/blog/both-pass also +1 on placing heavy emphasis on the plan. if you have a good plan, then the code becomes trivial. I have started doing a 70/30 or even 80/20 split of time spent on plan / time implementing & reviewing |
|