|
|
|
|
|
by suprgeek
788 days ago
|
|
Great project and excellent initiative to learn about embeddings.
Two possible avenues to explore more.
Your system backend could be thought of as being composed of two parts:
|Icons->Embedder->|PGVector|->Retriever->Display Result| 1. In the embedder part trying out different embedding models and/or vector dimensions to explore if the Recall@K & Precision@K for your data set (icons) improves. Models make a surprising amount of difference to the quality of the results. Try the MTEB Leaderboard for ideas on which models to explore. 2. In the Information Retriever part you can try a couple of approaches:
a.after you retrieve from PGVector see if you can use a reranker like Cohere to get better results https://cohere.com/blog/rerank b.You could try a "fusion ranking" similar to the one you do but structured such that 50% of the weight is for a plain old keyword search in the metadata and 50% is for the embedding based search Finally something more interesting to noodle on - what if the embeddings were based on the icon images and the model knew how to search for a textual descriptions in the latent space? |
|