|
|
|
|
|
by embedding-shape
84 days ago
|
|
I feel like you're missing the initial context of this conversation (no pun intended): > Like for example a trusted user makes feedback -> feedback gets curated into a ticket by an AI agent, then turned into a PR by an Agent, then reviewed by an Agent, before being deployed by an Agent. Once you add "humans for clarifications and take direction" then yeah, things can be useful, but that's far away from the non-human-involvment-loop earlier described in this thread, which is what people are pushing back against. Of course, involving people makes things better, that's the entire point here, and that by removing the human, you won't get as good results. Going back to benchmarks, obviously involving humans aren't possible here, so again we're back to being unable to score these processes at all. |
|
Benchmarks ==> it's absolutely not a given that humans can't be involved in the loop of performance measurement. Why would that be the case?