Hacker News new | ask | show | jobs
by simonw 701 days ago
It turns out picking that threshold is extremely difficult - I've tried! The value seems to differ for different searches, so picking eg 0.7 as a fixed value isn't actually as useful as you would expect.
1 comments

Agreed that thresholds don't work when applied to the cosine similarity of embeddings. But I have found that the similarity score returned by high-quality rerankers, especially Cohere, are consistent and meaningful enough that using a threshold works well there.
I use similarity threshold (to remove absolutely irrelevant results) and then use a reranker to get Top N.