|
|
|
|
|
by lsorber
576 days ago
|
|
You don’t have to reduce a long context to a single embedding vector. Instead, you can compute the token embeddings of a long context and then pool those into say sentence embeddings. The benefit is that each sentence’s embedding is informed by all of the other sentences in the context. So when a sentence refers to “The company” for example, the sentence embedding will have captured which company that is based on the other sentences in the context. This technique is called ‘late chunking’ [1], and is based on another technique called ‘late interaction’ [2]. And you can combine late chunking (to pool token embeddings) with semantic chunking (to partition the document) for even better retrieval results. For an example implementation that applies both techniques, check out RAGLite [3]. [1] https://weaviate.io/blog/late-chunking [2] https://jina.ai/news/what-is-colbert-and-late-interaction-an... [3] https://github.com/superlinear-ai/raglite |
|