| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by throwaway888abc 796 days ago

Looks great, do you have any concrete data how much money it will save ?

Also, how does it compare to for example GptCache[0] ? or any other semantic cache solution[1] ?

[0] https://gptcache.readthedocs.io/en/latest/

[1] https://portkey.ai/blog/reducing-llm-costs-and-latency-seman...

1 comments

zaiste 796 days ago

We are still exploring. We don’t have any concrete data yet, but in some instances, we've observed reductions up to ten times. This seems especially relevant to specific areas, e.g. chatbots, where similar questions happen more often.

link

throwaway888abc 796 days ago

>We are still exploring. Fair point. Worth of looking into, is to create/train/tune small model (2b/7b) based on previous cached answers in case your knowledge index/domain is without changes in time.

Exciting times

link