|
|
|
|
|
by budududuroiu
778 days ago
|
|
My issue with RAG systems isn’t hallucinations. Yes sure those are important. My issue is recall. Given petabyte-scale index of chunks, how can I make sure that my RAG system surfaces the “ground truth” I need, and not just “the most similar vector”. This I think is scarier. A healthcare-oriented (or any industry) RAG retrieving a bad, but highly linguistically similar answer. |
|
Which is a much harder problem to solve outside few highly standardized niches/ industries.
I think synthetic data generation as a mean to guide LLMs over a larger than optimal search space is going to be quite interesting.