Hacker News new | ask | show | jobs
by throwaway888abc 796 days ago
Looks great, do you have any concrete data how much money it will save ?

Also, how does it compare to for example GptCache[0] ? or any other semantic cache solution[1] ?

[0] https://gptcache.readthedocs.io/en/latest/

[1] https://portkey.ai/blog/reducing-llm-costs-and-latency-seman...

1 comments

We are still exploring. We don’t have any concrete data yet, but in some instances, we've observed reductions up to ten times. This seems especially relevant to specific areas, e.g. chatbots, where similar questions happen more often.
>We are still exploring. Fair point. Worth of looking into, is to create/train/tune small model (2b/7b) based on previous cached answers in case your knowledge index/domain is without changes in time.

Exciting times