| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by davidbarker 631 days ago

I was impressed by Upstash's approach to something similar with their "Semantic Cache".

https://github.com/upstash/semantic-cache

  "Semantic Cache is a tool for caching natural text based on semantic similarity. It's ideal for any task that involves querying or retrieving information based on meaning, such as natural language classification or caching AI responses. Two pieces of text can be similar but not identical (e.g., "great places to check out in Spain" vs. "best places to visit in Spain"). Traditional caching doesn't recognize this semantic similarity and misses opportunities for reuse."

1 comments

OutOfHere 631 days ago

I strongly advise not relying on embedding distance alone for it because it'll match these two:

1. great places to check out in Spain

2. great places to check out in northern Spain

Logically the two are not the same, and they could in fact be very different despite their semantic similarity. Your users will be frustrated and will hate you for it. If an LLM validates the two as being the same, then it's fine, but not otherwise.

link

DeveloperErrata 631 days ago

I agree, a naive approach to approximate caching would probably not work for most use cases.

I'm speculating here, but I wonder if you could use a two stage pipeline for cache retrieval (kinda like the distance search + reranker model technique used by lots of RAG pipelines). Maybe it would be possible to fine-tune a custom reranker model to only output True if 2 queries are semantically equivalent rather than just similar. So the hypothetical model would output True for "how to change the oil" vs. "how to replace the oil" but would output False in your Spain example. In this case you'd do distance based retrieval first using the normal vector DB techniques, and then use your custom reranker to validate that the potential cache hits are actual hits

link

OutOfHere 631 days ago

Any LLM can output it, but yes, a tuned LLM can benefit with a shorter prompt.

link

jankovicsandras 630 days ago

A hybrid search approach might help, like combining vector similarity scores with e.g. BM25 scores.

Shameless plug (FOSS): https://github.com/jankovicsandras/plpgsql_bm25 Okapi BM25 search implemented in PL/pgSQL for Postgres.

link