| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by behnamoh 1122 days ago
	When you split a document into chunks, doesn't some crucial information get cut in half? In that case, you'd probably lose that information in the context if that information was immediately followed by an irrelevant information that reduces the cosine similarity. Is there a "smarter" way to feed documents as context to LLMs?

1 comments

haolez 1122 days ago

Don't know if there is a smarter way, but these libraries usually offer an overlap parameter that allows you to repeat the last N characters of a chunk in the first N of the next chunk.

link