|
|
|
|
|
by minxomat
1191 days ago
|
|
I'm building this, for (mostly) non-scientific non-fiction works (books, articles, news, etc.). Launching soon, with about 7,500 books indexed. Generally, what I found useful to build a graph between "topics" or entities was to use a HyDE[1] prompt to generate possible distinct definitions and then build a nearest-neighbor network from that. This successfully identifies related concepts in the truly abstract sense rather than literal entities. [1] https://summarity.com/hyde |
|
HyDE sounds like an interesting approach. All dense retrieval approaches suffer from the problem you outlined in the blogpost. Have you looked at keyword-based or late-interaction models for retrieval such as ColBERTv2[1]? I find that late-interaction methods seem to offer best trade-off between semantic intelligence (precision) and retriving relevant documents (recall).
[1] https://github.com/stanford-futuredata/ColBERT