| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Talderigi 100 days ago
	Curious how the semantic caching layer works.. are you embedding requests on the gateway side and doing a vector similarity lookup before proxying? And if so, how do you handle cache invalidation when the underlying model changes or gets updated?

1 comments

giorgi_pro 100 days ago

Hey, contributor here. That's right, GoModel embeds requests and does vector similarity lookup before proxying. Regarding the cache invalidation, there is no "purging" involved – the model is part of the namespace (params_hash includes the LLM model, path, guardrails hash, etc). TTL takes care of the cleanup later.

link