|
|
|
|
|
by jgraettinger1
335 days ago
|
|
> You can't do that for LLM output. That's true if you're just evaluating the final answer. However, wouldn't you evaluate the context -- including internal tokens -- built by the LLM under test ? In essence, the evaluator's job isn't to do separate fact-finding, but to evaluate whether the under-test LLM made good decisions given the facts at hand. |
|