Hacker News new | ask | show | jobs
by loudmax 842 days ago
That's not how RAG works. What you're describing is something closer to prompt optimization.

Sibling comment from discordance has a more accurate description of RAG. There's a longer description from Nvidia here: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge...

1 comments

Right, you read something nebulous about how "the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user", and you think there is some magic going on, and then you click one link deeper and read at https://ai.meta.com/blog/retrieval-augmented-generation-stre... :

> Given the prompt “When did the first mammal appear on Earth?” for instance, RAG might surface documents for “Mammal,” “History of Earth,” and “Evolution of Mammals.” These supporting documents are then concatenated as context with the original input and fed to the [...] model

Finding the relevant context to put in the prompt is a search problem, nearest neighbour search on embeddings is one basic way to do it but the singular focus on "vector databases" is a bit of hype phenomenon IMO - a real world product should factor a lot more than just pure textual content into the relevancy score. Or is your personal AI assistant going to treat emails from yesterday as equally relevant as emails from a year ago?