Hacker News new | ask | show | jobs
by cjbprime 925 days ago
You don't need to train the model on your data: you can use retrieval augmented generation to add the relevant documents to your prompt at query time.
2 comments

This works if the document plus prompt fit in the context window. I suspect the most popular task for this workflow is summary which presumably means large documents. That's when you begin scaling out to a vector store and implementing those more advanced workflows. It does work even by sending a large document on certain local models, but even with the highest tier MacBook Pro a large document can quickly choke up any LLM and bring inference speed to a crawl. Meaning, a powerful client is still required no matter what. Even if you generate embeddings in "real-time" and dump to a vector store that process would be slow in most consumers hardware.

If you're passing in smaller documents then it works pretty good for real-time feedback.

Thank you for explanation. I see there is still a lot I have to learn about LLMs.