|
|
|
|
|
by JamesSwift
175 days ago
|
|
How would you evaluate it if the agent were not a fuzzy logic machine? The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough |
|