I've got an open ai API key, and I pay for chatgpt. I'd imagine switching to this and using openai would end up costing quite a lot? How are people running it relatively cheaply?
One way people keep costs down when using OpenAI with an offline RAG system is by limiting the number of text snippets sent to the API. Instead of sending the whole database, they'll typically retrieve only the top 10 (or so) most relevant snippets from the vector database and just send those to OpenAI for processing. This significantly reduces the amount of data being processed and billed by OpenAI.
Openrouter... you get all the models and its not as expensive as you would think. I spent $3 with aider the other day in like the blink of an eye with Anthropic. I am working on a FASTHTML thingy and loaded all the docs, plus a few huge replicate api files into the vector database. Most of my back and forth usage averaged about $0.02 for each turn with Claude 3.5 Sonnet. To give you an idea: My context + prompt were around 18000 tokens with completions around 1500 tokens.