|
|
|
|
|
by egorfine
972 days ago
|
|
I fail to imagine a 8k-token-length piece of text that has just one single semantic coordinate and is appropriate for embedding and vector search. In my experience, any text is better embedded using a sliding window of a few dozen words - this is the approximate size of semantic units in a written document in english; although this will wildly differ for different texts and topics. |
|
I can see a sliding window working for semantic search and RAG, but not so much for clustering or finding related documents.