|
|
|
|
|
by EngineeringStuf
482 days ago
|
|
Am I correct in reading that the RAG pipeline runs in realtime in response to a user query? If so, then I would suggest that you run it ahead of time and generate possible questions from the LLM based on the context of the current semantically split chunk. That way you only need to compare the embeddings at query time and it will already be pre-sorted and ranked. The trick, of course, is chunking it correctly and generating the right questions. But in both cases I would look to the LLM to do that. Happy to recommend some tips on semantically splitting documents using the LLM with really low token usage if you're interested. |
|
Go on please :)