Hacker News new | ask | show | jobs
by behnamoh 1122 days ago
When you split a document into chunks, doesn't some crucial information get cut in half? In that case, you'd probably lose that information in the context if that information was immediately followed by an irrelevant information that reduces the cosine similarity. Is there a "smarter" way to feed documents as context to LLMs?
1 comments

Don't know if there is a smarter way, but these libraries usually offer an overlap parameter that allows you to repeat the last N characters of a chunk in the first N of the next chunk.