| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by skeptrune 678 days ago

I'm not a fan of this technique. I agree the scenario they lay out is a common problem, but the proposed solution feels odd.

Vector embeddings have bag-of-words compression properties and can over-index on the first newline separated text block to the extent that certain indices in the resulting vector end up much closer to 0 than they otherwise would. With quantization, they can eventually become 0 and cause you to lose out on lots of precision with the dense vectors. IDF search overcomes this to some extent, but not enough.

You can "semantically boost" embeddings such that they move closer to your document's title, summary, abstract, etc. and get the recall benefits of this "context" prepend without polluting the underlying vector. Implementation wise it's a weighted sum. During the augmentation step where you put things in the context window, you can always inject the summary chunk when the doc matches as well. Much cleaner solution imo.

Description of "semantic boost" in the Trieve API[1]:

>semantic_boost: Semantic boost is useful for moving the embedding vector of the chunk in the direction of the distance phrase. I.e. you can push a chunk with a chunk_html of "iphone" 25% closer to the term "flagship" by using the distance phrase "flagship" and a distance factor of 0.25. Conceptually it's drawing a line (euclidean/L2 distance) between the vector for the innerText of the chunk_html and distance_phrase then moving the vector of the chunk_html distance_factorL2Distance closer to or away from the distance_phrase point along the line between the two points.

[1]:https://docs.trieve.ai/api-reference/chunk/create-or-upsert-...

1 comments

torginus 677 days ago

Sorry random question - do vector dbs work across models? I'd guess no, since embeddings are models specific afaik, but that means that a vector db would lock you into using a single LLM and even within that, a single version, like Claude-3.5 Sonnet, and you couldn't move to 3.5 Haiku, Opus etc., never mind ChatGPT or Llama without reindexing.

link

rvnx 677 days ago

In short: no.

The vector databases are here to store vectors and calculating distance between vectors.

The embeddings model is the model that you pick to generate these vectors from a string or an image.

You give "bart simpson" to an embeddings model and it becomes (43, -23, 2, 3, 4, 843, 34, 230, 324, 234, ...)

You can imagine it like geometric points in space (well, it's a vector though), except that instead of being 2D, or 3D-space, they are typically in higher-number of dimensions (e.g: 768).

When you want to find similar entries, you just generate a new vector "homer simpson" (64, -13, 2, 3, 4, 843, 34, 230, 324, 234, ...) and send it to the vector database and it will return you all the nearest neighbors (= the existing entries with the smallest distance).

To generate these vectors, you can use any model that you want, however, you have to stay consistent.

It means that once you are using one embedding model, you are "forever" stuck with it, as there is no practical way to project from one vector space to another.

link

torginus 677 days ago

that sucks :(. I wonder if there are other approaches to this, like simple word lookup, with storing a few synonyms, and prompting the LLM to always use the proper technical terms when performing a lookup.

link

kordlessagain 677 days ago

Back of the book index or inverted indexes can be stored in a set store and give decent results that compare to vector lookups. The issue with them is you have to do an extraction inference to get the keywords.

link

sebastiennight 676 days ago

The sibling comments seem to be correct in their technical explanations, but miss the meaning I'm getting from your question.

My understanding is you want to know "are vector DBs compatible with specific LLMs, or are we stuck with a specific LLM if we want to do RAG once we've adopted a specific vector store?"

And the answer to that is that the LLM never sees the vectors from your DB. Your LLM only sees what you submit as context (ie the "system" and "user" prompts in chat-based models).

The way RAG works is:

1 - end-user submits a query

2 - this query is embedded (with the same model that was used to compile the vector store) and compared (in the vector store) with existing content, to retrieve relevant chunks of data

3 - and then this data (in the form of text segments) is passed to the LLM along with the initial query.

So, in a sense you're "locked in" in the sense that you need to use the same embedding model for storage and for retrieval. But you can definitely swap out the LLM for any other LLM without reindexing.

An easy way to try this behavior out as a layperson is to use AnythingLLM which is an open-source desktop client, that allows you to embed your own documents and use RAG locally with open-weight models or swap out any of the LLM APIs.

link

passion__desire 677 days ago

Embedding is a transformation which allows us to find semantically relevant chunks from a catalogue given a query. Through some nearness criteria, you would retrieve "semantically relevant" chunks which along with query would be fed to LLMs and ask them to synthesize the best answer. Vespa docs are very great if you are thinking of building in this space. Retrieval part is independent of synthesis, hence it has its separate leaderboard on huggingface.

https://docs.vespa.ai/en/embedding.html

https://huggingface.co/spaces/mteb/leaderboard

link