|
|
|
|
|
by mif
789 days ago
|
|
For those of us who don’t know what RAG is (including myself), RAG stands for Retrieval Augmented Generation. From the video in this IBM post [0], I understand that it is a way for the LLM to check what its source and latest date of information is. Based on that, it could, in principle, say “I don’t know”, instead of “hallucinating” an answer. A RAG is a way to implement this feature for LLMs. [0] https://research.ibm.com/blog/retrieval-augmented-generation... |
|
The art of implementing RAG is deciding what text should be pasted into the prompt in order to get the best possible results.
A popular way to implement RAG is using similarity search via vector search indexes against embeddings (which I explained at length here: https://simonwillison.net/2023/Oct/23/embeddings/). The idea is to find the content that is semantically most similar to the user's question (or the likely answer to their question) and include extracts from that in the prompt.
But you don't actually need vector indexes or embeddings at all to implement RAG.
Another approach is to take the user's question, extract some search terms from it (often by asking an LLM to invent some searches relating to the question), run those searches against a regular full-text search engine and then paste results from those searches back into the prompt.
Bing, Perplexity, Google Gemini are all examples of systems that use this trick.