| Essentially you take any decent model trained on factual information regurgitation, or well any decently well rounded model, a llama 2 variant or something. Then you craft a prompt for the model along the lines of "you are a helpful assistant, you will provide an answer based on the provided information. If no information matches simply respond with 'I don't know that'". Then, you take all of your documents and divide them into meaningful chunks, ie by paragraph or something. Then you take these chunks and create embeddings for them. An embedding model is another type (not an llm) that generates vectors for strings of text often based on how similar the words are in _meaning_. Ie if I generate embeddings for the phrase "I have a dog" it might (simplified) be a vector like [0.1,0.2,0.3,0.4]. This vector can be seen as representing a point in a multidimensional space. What an embedding model does with the word meaning is something like if I want to search for "cat" that might embed as a vector [0.42]. Now, say we want to search for the query "which pets do I have" first we generate embeddings for this phrase, the word "pet" might be embedded as [0.41] in the vector. Because it's based on trained meaning, the vectors for "pet" and for "dog" will be close together in our multidimensional space. We can choose how strict we want to be with this search (basically a limit to how close the vectors need to be together in space to count as a match). Next step is to put this into a vector database, a db designed with vector search operations in mind. We store each chunk, the part of the file it's from and that chunks embedding vector in the database. Then, when the LLM is queried, say "which pets do I have?", we first generate embeddings for the query, then we use the embedding vector to query our database for things that match close enough in space to be relevant but loose enough that we get "connected" words. This gives us a bunch of our chunks ranked by how close that chunks vector is to our query vector in the multidimensional space. We can then take the n highest ranked chunks, concatenate their original text and prepend this to our original LLM query. The LLM then digests this information and responds in natural language. So the query sent to the LLM might be something like: "you are a helpful assistant, you will provide an answer based on the provided information. If no information matches simply respond with 'I don't know that' Information:I have a dog,my dog likes steak,my dog's name is Fenrir User query: which pets do I have?" All under "information" is passed in from the chunked text returned from the vector db. And the response from that LLM query would ofc be something like "You have a dog, its name is Fenrir and it likes steak." |
(Seems like this is what reinforcement training is, but I am just not sure? Everything seems to mush together when talking about gpts logic)