Hacker News new | ask | show | jobs
by wilsonzlin 773 days ago
Thanks! Yeah sometimes there are one or two "far" away results which make the auto zoom seem strange. It's something I'd like to tune, perhaps zooming to where most but not all results are.
1 comments

Often embeddings are not so good for comparing similarity of text. A cross-encoder might be a good alternative, perhaps as a second-pass, since you already have the embeddings. https://www.sbert.net/docs/pretrained_cross-encoders.html Pairwise, this can be quite slow, but as a second pass, it might be much higher quality. Obviously this gets into LLM's territory, but the language models for this can be small and more reliable than cosine on embeddings.