Hacker News new | ask | show | jobs
by Dachande663 1165 days ago
If you look at something like LangChain[0], it supports/recommends splitting larger documents into smaller chunks. In this way, when doing something like semantic search you can get the specific paragraph/section that holds the closest relevance, rather than having to read the entire document again (think of a 100 page PDF).

https://python.langchain.com/en/latest/modules/indexes/text_...