Hacker News new | ask | show | jobs
by blizzardman 819 days ago
Thank you for the recommendation. RAG would make sense Here is my understanding of how to do it

1. Use sentence transformer, transform the entire harry potter or lord of the ring book into embeddings

2. transform query into embeddings -> "why don't gandolf sent the eagles"

3. Find most relevant text using ANN through the query embeddings

4. pipe in the context + query to llama

However the result is not very good, am I missing something in RAG?

1 comments

Make sure you're using a SOTA embedding model (UAE, embedding-ada-002, etc) that is capable of creating a vector from a reasonably large token size, see here for comparisons: https://huggingface.co/spaces/mteb/leaderboard

Experiment with a "sliding scale" around the book (paragraphs, pages, etc). Try to use a graph to relate book sections, etc.

Consider setting up a tuner with well defined questions and answers to search for optimality around embeddings.