|
|
|
|
|
by gavmor
810 days ago
|
|
Thanks for sharing! I look forward to playing with this once I get off my phone. Took a look at the code, though, to see if you've implemented any of the tricks I've been too lazy to try. `text_splitter=RecursiveCharacterTextSplitter( chunk_size=8000, chunk_overlap=4000)` Does this simple numeric chunking approach actually work? Or are more sophisticated splitting rules going to make a difference? `vector_store_ppt=FAISS.from_documents(text_chunks_ppt, embeddings)` So we're embedding all 8000 chars behind a single vector index. I wonder if certain documents perform better at this fidelity than others. To say nothing of missed "prompt expansion" opportunities. |
|
Regarding the index usually a mix of BM25 and vector index seems to perform best for most generic data.