Hacker News new | ask | show | jobs
by adultSwim 1101 days ago
Running an LLM every time someone clicks on a button is expensive and slow in production, but probably still ~10x cheaper to produce than code.
1 comments

New techniques like semantic caching will help. This is the modern era's version of building a performant social graph.
What's semantic caching?
With LLMs, the inputs are highly variable so exact match caching is generally less useful. Semantic caching groups similar inputs and returns relevant results accordingly. So {"dish":"spaghetti bolognese"} and {"dish":"spaghetti with meat sauce"} could return the same cached result.
Or store as sentence embedding and calculate the vector distance, but creates many edge cases