|
|
|
|
|
by m-i-l
916 days ago
|
|
> "Has somebody experience with Apache Lucene / Solr or Elasticsearch?" I've been working on a RAG with Solr, and quickly hit some of the issues you describe when dealing with real-world messy data and user input, e.g. using all-MiniLM-L6-v2 and cosine similarity, "Can you summarize Immanuel Kant's biography?" matched a chunk containing just the word "Biography" rather than one which started "Immanuel Kant, born in 1724...", and "How high is Ben Nevis?" matched a chunk of text about someone called Benjamin rather than a chunk about mountains containing the words "Ben Nevis" and its height[0]. Switching embedding model has helped, but still not convinced that vector search alone is the silver bullet some claim it is. Still lots more to try though, e.g. hybrid search[1], query expansion[2], knowledge graphs etc. [0] https://www.michael-lewis.com/posts/vector-search-and-retrie... [1] https://sease.io/2023/12/hybrid-search-with-apache-solr.html [2] https://news.ycombinator.com/item?id=38706913 |
|