GPTCache: Slash Your LLM API Costs by 10x

Y	Hacker News new \| ask \| show \| jobs

	GPTCache: Slash Your LLM API Costs by 10x (github.com)
	18 points by fhaltmayer 1168 days ago

3 comments

fzliu 1168 days ago

Keep up the awesome work. I've run across this problem myself - I somehow used $20 just testing a small demo I made with GPT-3.5.

As most ML is inherently probabilistic, it seems reasonable to make an LLM cache both semantic and _stochastic_, i.e. you wouldn't want the same answer every time you use "pick me a color" as prompt. Injecting the original LLM (GPT, Bard, etc) response as prompt for alpaca or some other model could make this cache virtually invisible.

link

cxie 1168 days ago

The idea of incorporating stochastic behavior to the cache is fascinating, as it would indeed allow for more dynamic and diverse responses to certain types of queries. Combining different LLMs to achieve this could be an interesting approach to explore.

link

cxie 1168 days ago

It looks like a game-changer for those working with LLM services. By caching query results, it effectively cuts down the number of requests and token count sent to the LLM service, leading to a substantial reduction in overall costs.

If you're leveraging LLMs for your projects, it's definitely worth giving GPTCache a look!

link

tester457 1168 days ago

Between langchain and this it looks like every new LLM API wrapper startup is going to use python.

link

cxie 1168 days ago

It's true that Python seems to be the go-to language for many LLM API wrapper projects. Its popularity in the AI and ML communities might be a contributing factor.

link

tester457 1167 days ago

Definitely. And because of AI's now explosive popularity python might actually cement its position as the lingua franca of modern programming.

Not to mention the chatgpt code interpreter plugin allowing sandboxed python execution and many beginners starting to code with llms, nearly everything will be in python eventually.

link