Hacker News new | ask | show | jobs
by aznumeric 645 days ago
One way people keep costs down when using OpenAI with an offline RAG system is by limiting the number of text snippets sent to the API. Instead of sending the whole database, they'll typically retrieve only the top 10 (or so) most relevant snippets from the vector database and just send those to OpenAI for processing. This significantly reduces the amount of data being processed and billed by OpenAI.