Hacker News new | ask | show | jobs
by varunkrishnan17 657 days ago
Thanks for the well-thought out question Jadiker!

This is a potential limitation of N-gram precision with context matching, which we were using in the RAG demo for simplicity (though even with this, I don't think it would be so extreme :-) )

We already offer two other different hallucination detection approaches which should mitigate this problem - an LLM-as-a-judge model for evaluation, and semantic similarity matching. We've also considered, for example, using metrics such as BertScore. Do you have other ideas? :-)