|
|
|
|
|
by yatz
764 days ago
|
|
Assistants API is promising, but earlier versions have many issues, especially with how it calculates the costs. As per OpenAI docs, you pay for data storage, a fixed price per API call, + token usage. It sounds straightforward until you start using it. Here is how it works. When you upload attachments, in my case a very large PDF, it chunks that PDF into small parts and stores them in a vector database. It seems like the chunking part is not that great, as every time you make a call, the system loads a large chunk or many chunks and sends them to the model along with your prompt, which inflates your per request costs to 10 times more than the prompt + response tokens combined. So, be mindful of the hidden costs and monitor your usage. |
|
This is how RAG works.
While you can come up with work-arounds like using lesser LLMs as a pre-filtering step the fact is that if you need GPT to read the doc you need GPT to read the doc.