| HN Mirror

The article starts off by creating in-memory vector embeddings for a list of "documents" (aka chunks of text) using the nomic model:

https://www.nomic.ai/blog/posts/nomic-embed-text-v1

They then use cosine similarity to compare the user's query to retrieve a list of "top N" embeddings which point to the doc chunks, then shove those into the entire query which is sent off to the LLM. RAG is just a means of injecting "relevant" docs into your input context - no training or special LLMs are required.

To do what you're asking, you'd need to fetch a recent wikipedia dump, spend some time 'normalizing'/'sanitizing', and perhaps most importantly, figure out how to divide each article (some of which would well exceed the embedding size) into chunks that you could generate embeddings from. Then you'd need to store them into a vector database (unlike the article which does not persist them). I personally use Qdrant, there is also Postgresql(has a pgvector extension), lancedb, etc.