Hacker News new | ask | show | jobs
by claritise 804 days ago
Just create a RAG with wikipedia as the corpus and a low parameter model to run it and you can basically have an instantly queryable corpus of human knowledge runnable on an old raspberry pi.
2 comments

> a low parameter model

> on an old raspberry pi

I bet the LLM responses will be great... You're better off just opening up a raw text dump of Wikipedia markup files in vim.

but which model to tokenize with? is there a leaderboard for models that are good for RAG?
“For RAG” is ambiguous.

First there is a leaderboard for embeddings. [1]

Even then, it depends how you use them. Some embeddings pack the highest signal in the beginning so you can truncate the vector, while most can not. You might want that truncated version for a fast dirty index. Same with using multiple models of differing vector sizes for the same content.

Do you preprocess your text? There will be a model there. Likely the same model you would use to process the query.

There is a model for asking questions from context. Sometimes that is a different model. [2]