|
|
|
|
|
by celestialcheese
1176 days ago
|
|
Very interested in this - I've been using embeddings / semantic search doing information retrieval from PDFs, using ada-002, and have been impressed by the results in testing. The reasons the article listed, namely a) lock-in and b) cost, have given me pause with embedding our whole corpus of data. I'd much rather use an open model but don't have much experience in evaluating these embedding models and search performance - still very new to me. Like what you did with ada-002 vs Instruct XL, has there been any papers or prior work done evaluating the different embedding models? |
|
Generally MiniLM is a good baseline. For faster models you want this library:
https://github.com/oborchers/Fast_Sentence_Embeddings
For higher quality ones, just take the bigger/slower models in the SentenceTransformers library