| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aznumeric 693 days ago
	One way people keep costs down when using OpenAI with an offline RAG system is by limiting the number of text snippets sent to the API. Instead of sending the whole database, they'll typically retrieve only the top 10 (or so) most relevant snippets from the vector database and just send those to OpenAI for processing. This significantly reduces the amount of data being processed and billed by OpenAI.