|
Does any of those LLM-as-a-service companies provide a mechanism to "save" a given input? Paying only for the state storage and the extra input when continuing the completion from the snapshot? Indeed, at 1M token and $15/M tokens, we are talking of $10+ API calls (per call) when maxing out the LLM capacity. I see plenty of use cases for such a big context, but re-paying, at every API call, to re-submit the exact same knowledge base seems very inefficient. Right now, only ChatGPT (the webapp) seems to be using such those snapshots. Am I missing something? |
If you don't care about latency or can wait to set up a batch of inputs in one go there's an alternative method. I call it batch prompting and pretty much everything we do at work with gpt-4 uses this now. If people are interested I'll do a proper writeup on how to implement it but the general idea is very straightforward and works reliably. I also think this is a much better evaluation of context than needle in a haystack.
Example for classifying game genres from descriptions.
Default:
[Prompt][Functions][Examples][game description]
- >
{"genre": [genre], "sub-genre": [sub-genre]}
Batch Prompting:
[Prompt][Functions][Examples]<game1>[description]</game><game2>[description]</game><game3>[description]</game>...
- >
{"game1": {...}, "game2": {...}, "game3": {...}, ...}