Hacker News new | ask | show | jobs
by nirga 701 days ago
I think it depends on the use case and how you define hallucinations. We've seen our metrics perform well (=correlates with human feedback) for use cases like summarization, RAG question-answering pipeline, and entity extraction.

At the end of the day things like "answer relevancy" are pretty dichotomic in a sense that for a human evaluator it will be pretty clear whether an answer is answering a question or not.

I wonder if you can elaborate on why you claim that there's no ability to detect with any certainty hallucinations.