Hacker News new | ask | show | jobs
GPTCache: Slash Your LLM API Costs by 10x (github.com)
18 points by fhaltmayer 1168 days ago
3 comments

Keep up the awesome work. I've run across this problem myself - I somehow used $20 just testing a small demo I made with GPT-3.5.

As most ML is inherently probabilistic, it seems reasonable to make an LLM cache both semantic and _stochastic_, i.e. you wouldn't want the same answer every time you use "pick me a color" as prompt. Injecting the original LLM (GPT, Bard, etc) response as prompt for alpaca or some other model could make this cache virtually invisible.

The idea of incorporating stochastic behavior to the cache is fascinating, as it would indeed allow for more dynamic and diverse responses to certain types of queries. Combining different LLMs to achieve this could be an interesting approach to explore.
It looks like a game-changer for those working with LLM services. By caching query results, it effectively cuts down the number of requests and token count sent to the LLM service, leading to a substantial reduction in overall costs.

If you're leveraging LLMs for your projects, it's definitely worth giving GPTCache a look!

Between langchain and this it looks like every new LLM API wrapper startup is going to use python.
It's true that Python seems to be the go-to language for many LLM API wrapper projects. Its popularity in the AI and ML communities might be a contributing factor.
Definitely. And because of AI's now explosive popularity python might actually cement its position as the lingua franca of modern programming.

Not to mention the chatgpt code interpreter plugin allowing sandboxed python execution and many beginners starting to code with llms, nearly everything will be in python eventually.