Hacker News new | ask | show | jobs
by teej 1060 days ago
> This approach reduces the likelihood of hallucinations among LLMs.

This has not been my experience. Did you create any benchmarks as a part of this project?

2 comments

I am the author of this article. And actually what we tried to do was to replicate the simplest implementation to Retrieval Augmented Language Models by prompting the LLM. There have been many researches on this topic right now like work from Meta(https://arxiv.org/pdf/2208.03299v3.pdf). I think it can give you a picture how those RALMs boost the performance on General QA tasks.
This idea is a simplified version of Retrieval-Augmented Generation (RAG), and RAG has been studied in various research papers, such as the one available at https://arxiv.org/abs/2005.11401
My experience with RAG is that while it reduces the incidence of hallucinations* significantly (especially if you reduce the LLM temperature to zero at the same time), it doesn't eliminate them.

My startup has a product for lawyers that uses RAG to answer legal queries (https://lawlight.ai/). We have a disclaimer that "... (we) do not guarantee the accuracy of answers. You are responsible for reviewing the cited case law and drawing your own independent conclusions."

(This works within the specific context—lawyers are domain experts; and they are supposed to read through all cases they cite in court anyway.)

* I dislike the term "hallucinations." By definition LLMs hallucinate. It's just that much (or most) of the time, the hallucinations reflect reality.