|
|
|
|
|
by Lienetic
696 days ago
|
|
Where can I learn more detail about the metrics you support and how they work? I tried multiple other solutions but kept running into the problem that occasionally the framework would give me some score/evaluation of an LLM response that didn't make any sense, and there was minimal information about how it came up with the score. Often, I'd end up digging into the implementation of the framework to find the underlying evaluation prompt or classifier only to realize that the metric name is confusing or results are low confidence. I'm more cautious about using these tools now and look more deeply at how they work so that I can assess grading quality before relying on them to identify problematic outputs (e.g. hallucinations). |
|
One need to think of these metrics as a way to filter all the data to find potential issues, and not as a final evaluation criteria. The golden criteria should be human evaluators.