|
RAG is taking a bunch of docs, chunking them it to text blocks of a certain length (how best todo this up for debate), creating a search API that takes query (like a google search) and compares it to the document chunks (very much how your describing). Take the returned chunks, ignore the score from vector search, feed those chunks into a re-ranker with the original query (this step is important vector search mostly sucks), filter those re-ranked for the top 1/2 results and then format a prompt like; The user ask 'long query', we fetched some docs (see below), answer the query based on the docs (reference the docs if u feel like it) Doc1.pdf - Chunk N
Eat cheese Doc2.pdf- Chunk Y
Dont eat cheese You then expose the search API as a "tool" for the LLM to call, slightly reformatting the prompt above into a multi turn convo, and suddenly you're in ze money. But once your users are happy with those results they'll want something dumb like the latest football scores, then you need a web tool - and then it never ends. To be fair though, its pretty powerful once you've got in place. |