|
|
|
|
|
by DougBTX
875 days ago
|
|
Embeddings are a type of lossy compression, so roughly speaking, using more embedding bytes for a document preserves more information about what it contains. Typically documents are broken down into chunks, then the embedding for each chunk is stored, so longer documents are represented by more embeddings. Going further down the AI == compression path, there’s: http://prize.hutter1.net/ |
|
Always felt they're more like hashes/fingerprints for the RAG use cases.
> Typically documents are broken down into chunks
That's what I would have guessed. It's still surprising that the embeddings don't fit into RAM though.
That said (the following I just realized), even if the embeddings don't fit into RAM at the same time, you really don't need to load them all into RAM if you're just performing a linear scan and doing cosine similarity on each of them. Sure it may be slow to load tens of GB of embedding info... but at this rate I'd be wondering what kind of textual data one could feasibly have that goes into the terrabyte range. (Also, generating that many embedding requires a lot of compute!)