Hacker News new | ask | show | jobs
by minxomat 1191 days ago
I'm building this, for (mostly) non-scientific non-fiction works (books, articles, news, etc.). Launching soon, with about 7,500 books indexed.

Generally, what I found useful to build a graph between "topics" or entities was to use a HyDE[1] prompt to generate possible distinct definitions and then build a nearest-neighbor network from that. This successfully identifies related concepts in the truly abstract sense rather than literal entities.

[1] https://summarity.com/hyde

2 comments

Hey @minxomat,

HyDE sounds like an interesting approach. All dense retrieval approaches suffer from the problem you outlined in the blogpost. Have you looked at keyword-based or late-interaction models for retrieval such as ColBERTv2[1]? I find that late-interaction methods seem to offer best trade-off between semantic intelligence (precision) and retriving relevant documents (recall).

[1] https://github.com/stanford-futuredata/ColBERT

I'd love to chat! I am doing a rough version of this at Shepherd.com and laying the ground work for more :) (ben@re-moveshepherd.com)

I haven't uses Wikidata info yet, but hoping to expand to that in 3 or 4 months.

Sure. Sent a ping.