Hacker News new | ask | show | jobs
by discordance 844 days ago
RAG:

1. First you create embeddings from your documents

2. Store that in a vector db

3. Ask what the user wants and do a search in the vector db (cosine similarity etc)

4. Feed the relevant search results to your LLM and do the usual LLM stuff with the returned embeddings and chunks of the documents

1 comments

Although RAG is often implemented via vector databases to find 'relevant' content, I'm not sure that's a necessary component. I've been doing what I call RAG by finding 'relevant' content for the current prompt context via a number of different algorithms that don't use vectors.

Would you define RAG only as 'prompt optimisation that involves embeddings'?

Sure thing, your RAG approach sounds intriguing, especially since you're sidestepping vector databases. But doesn't the input context length cap affect it? (chatgpt plus at 32K [0] or gpt4 via open ai at 128K [1]) Seems like those cases would be pretty rare though.

[0]: https://openai.com/chatgpt/pricing#:~:text=8K-,32K,-32K

[1]: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...

Yes, context window is a limiting factor, but that's true however you identify the content to augment generation.