As an LLM novice, can someone explain what these "your document" apps are doing? My understanding is that GPT-4 doesn't support fine-tuning, and 50MB is too large to add to the prompt (which would be too expensive anyway).
Hey! I'm the developer of Unriddle - it works using text embeddings. The document is split into small chunks and each chunk is assigned a numerical representation, or "vector", of its semantic meaning and relation to the other chunks. When a user prompts this too is assigned a vector and then compared to the rest of the chunks. The similar chunks are then fed into GPT-4 along with the query, ensuring the total number of words doesn't exceed the context window limit.
It's just the GPT-4 API - the chunks are sent as part of a prompt. In that case it won't use data from all chunks but it will try to find any chunks that provide descriptions of the document. I've found with research papers, for example, it fetches parts of the introduction and abstract.
Oh so there is pre-processing to find the useful portions? What are you using for the pre-processing?
I feel that it's inevitable that OpenAI et al. will be able to handle large PDF documents eventually. But until then I'm sure there's a lot of value of in this kind of pre-processing/chunking.
Yeah I think you're right - the 32k context window for GPT-4 (not available for everyone yet) is already enough for research papers. I'm using a library called Langchain, there's also LlamaIndex.
Vectorisation is done via OpenAI's embedding API. And the chunking/querying is happens through the Langchain library. But there are a few different ways of doing it - another good library is LLamaIndex.
Thanks a lot! Do you _have_ to do vectorization and querying with the same LLM? Can someone do vectorization with 1 and do querying with reevant chunks with another?
Simply speaking - They chunk the document (make it smaller so that it can be sent to gpt) and then vectorize it (change it to numbers / vector array). From there that is stored in a vector store - now, when you query you first query your vector store for the context (part of the 50MB file) and then send the context along with the question to GPT.
You are right GPT-4 doesn't support fine-tuning but, I think (in general) people might be misunderstanding what fine-tuning does.
Good explanation. Thanks! Can the first part, i.e. vectorizing and finding relevant chunks be done with any LLM (e.g. a self hosted one) and the second part, i.e. querying relevant chunks be done with OpenAI?