|
|
|
|
|
by padolsey
306 days ago
|
|
> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents? Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator. |
|