Hacker News new | ask | show | jobs
by JKCalhoun 493 days ago
Curious about RAGs. The article made it look like just a few additional parameters (context) you pass to the LLM. Somehow I was under the impression RAGs required training.

All I want is an LLM front-end to a local Wikipedia drop.

2 comments

The article starts off by creating in-memory vector embeddings for a list of "documents" (aka chunks of text) using the nomic model:

https://www.nomic.ai/blog/posts/nomic-embed-text-v1

They then use cosine similarity to compare the user's query to retrieve a list of "top N" embeddings which point to the doc chunks, then shove those into the entire query which is sent off to the LLM. RAG is just a means of injecting "relevant" docs into your input context - no training or special LLMs are required.

To do what you're asking, you'd need to fetch a recent wikipedia dump, spend some time 'normalizing'/'sanitizing', and perhaps most importantly, figure out how to divide each article (some of which would well exceed the embedding size) into chunks that you could generate embeddings from. Then you'd need to store them into a vector database (unlike the article which does not persist them). I personally use Qdrant, there is also Postgresql(has a pgvector extension), lancedb, etc.

No training needed, but you need to generate embeddings for all the content, store it in a vector DB, and wire it up.