| So I've done a ton of work in this area. Few learnings I've collected: 1. Lexical search with BM25 alone gives you very relevant results if you can do some work during ingestion time with an LLM. 2. Embeddings work well only when the size of the query is roughly on the same order of what you're actually storing in the embedding store. 3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well. So combining all 3 learnings, we landed on a knowledge decomposition and extraction step very similar to yours. But we stick a metaprompter to essentially auto-generate the domain / entity types. LLMs are naively bad at identifying the correct level of granularity for the decomposed knowledge. One trick we found is to ask the LLM to output a mermaid.js mindmap to hierarchically break down the input into a tree. At the end of that output, ask the LLM to state which level is the appropriate root for a knowledge node. Then the node is used to generate questions that could be answered from the knowledge contained in this node. We then index the text of these questions and also embed them. You can directly match the user's query from these questions using purely BM25 and get good outputs. But a hybrid approach works even better, though not by that much. Not using LLMs are query time also means we can hierarchically walk down the root into deeper and deeper nodes, using the embedding similiarity as a cost function for the traversal. |
Ha, that's brilliant. Thanks for sharing this!