| HN Mirror

You can use models (OpenAI have some, there are other open-source self-hostable ones that are better if I recall correctly) that will take a sentence or a paragraph and spit out a vector. These vectors are called 'embeddings'

You then put those vectors in a vector database (e.g. pinecone, pgvector, chroma).

To run searches, you generate an embedding of the search term (could be the raw user search, could be something a model like ChatGPT was asked to transform the user's search into), then query the vector database for the n closest vectors. The trick is getting a model that generates good vectors for search (and transforming the user's query into some text that'd be useful vector(s) to search against). If feeding that into an LLM context, the next step is making sure that you get your prompt right, and don't overload the model with unrelated information (i.e. bad search results).

The key is that the vector representation embeds language concepts in how close vectors are to one another. An easy way to gain a feel for this is to look at single-word embeddings. Computerphile have a great episode on it[1]. You can take a vector for 'King', subtract the vector for 'Man' and add the vector for 'Woman' and the closest vector in that search will likely be 'Queen'. Scale up this idea to whole paragraphs (and larger vectors as a result).

LangChain has an example of searching a database of facts[2] (although I find their documentation pretty inaccessible - they explain their library, but don't step back from inside the weeds of what they're doing to really explain why / what's going on). Many of the features LangChain implements are distilling (or sometimes simply lifting and providing a toolkit to directly apply) LLM papers.

1: Computerphile Word Embeddings https://www.youtube.com/watch?v=gQddtTdmG_8

2: https://langchain.readthedocs.io/en/latest/use_cases/questio...