|
|
|
|
|
by nirga
701 days ago
|
|
I think it depends on the use case and how you define hallucinations. We've seen our metrics perform well (=correlates with human feedback) for use cases like summarization, RAG question-answering pipeline, and entity extraction. At the end of the day things like "answer relevancy" are pretty dichotomic in a sense that for a human evaluator it will be pretty clear whether an answer is answering a question or not. I wonder if you can elaborate on why you claim that there's no ability to detect with any certainty hallucinations. |
|