Hacker News new | ask | show | jobs
by fzliu 1168 days ago
Keep up the awesome work. I've run across this problem myself - I somehow used $20 just testing a small demo I made with GPT-3.5.

As most ML is inherently probabilistic, it seems reasonable to make an LLM cache both semantic and _stochastic_, i.e. you wouldn't want the same answer every time you use "pick me a color" as prompt. Injecting the original LLM (GPT, Bard, etc) response as prompt for alpaca or some other model could make this cache virtually invisible.

1 comments

The idea of incorporating stochastic behavior to the cache is fascinating, as it would indeed allow for more dynamic and diverse responses to certain types of queries. Combining different LLMs to achieve this could be an interesting approach to explore.