|
|
|
|
|
by jkukul
1112 days ago
|
|
For most applications, packaging all the data and submitting it to OpenAI won't be feasible due to the limited token window size. I think the most common design pattern nowadays goes like this: 1. Chunk all your data (e.g. per paragraph of content) 2. Generate an embedding for each chunk 3. Index embeddings in a vector database 4. When a query comes in, find chunks relevant to the query (based on embeddings similarity) and ONLY send the relevant chunks + query to a LLM to formulate the answer Quickly glancing through the repository from this post, I can see that it also follows this pattern. It uses OpenAI's embedding API for 2. and Pinecone DB for 3. |
|
I don't think it is as much the context window size because you would chunk your data anyways. I think the counter argument is either that finetuning is limited by the risk of overfitting and catastrophic forgetting or cost prohibitive. I think it is more of the former. Am I on the right track with this arguments?
Another point to consider is probably the vector DB contains an exact version of your data you get that as a result whereas the model will only be able answer vaguely or by paraphrasing.