True, this is how RAG works, but this is why I prefer to use open-source LLMs for RAG: because the token costs are less opaque and I can control how many chunks I pull fromthe database to manage my costs
I believe it will get better and more efficient as we go. On a side note, OpenAI seems to release products before they are ready and they evolve as they go.