|
|
|
|
|
by yorwba
827 days ago
|
|
Works for: Bitext mining: Given a sentence in one language, find its translation in a collection of sentences in another language using the cosine similarity of embeddings. Classification: identify the kind of text you're dealing with using logistic regression on the embeddings. Clustering: group similar texts together using k-means clustering on the embeddings. Pair Classification: determine whether two texts are paraphrases of each other by using a binary threshold on the cosine similarity of the embeddings. Reranking: given a query and a list of potential results, sort relevant results ahead of irrelevant ones by sorting according to the cosine similarity of embeddings. Etc etc. These are MTEB benchmark tasks https://arxiv.org/pdf/2210.07316.pdf . If you have no need for something like that, good for you, you don't need to care how well embeddings work for these tasks. |
|