| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by minxomat 1191 days ago

I'm building this, for (mostly) non-scientific non-fiction works (books, articles, news, etc.). Launching soon, with about 7,500 books indexed.

Generally, what I found useful to build a graph between "topics" or entities was to use a HyDE[1] prompt to generate possible distinct definitions and then build a nearest-neighbor network from that. This successfully identifies related concepts in the truly abstract sense rather than literal entities.

[1] https://summarity.com/hyde

2 comments

abhinavkulkarni 1191 days ago

Hey @minxomat,

HyDE sounds like an interesting approach. All dense retrieval approaches suffer from the problem you outlined in the blogpost. Have you looked at keyword-based or late-interaction models for retrieval such as ColBERTv2[1]? I find that late-interaction methods seem to offer best trade-off between semantic intelligence (precision) and retriving relevant documents (recall).

[1] https://github.com/stanford-futuredata/ColBERT

link

bwb 1191 days ago

I'd love to chat! I am doing a rough version of this at Shepherd.com and laying the ground work for more :) (ben@re-moveshepherd.com)

I haven't uses Wikidata info yet, but hoping to expand to that in 3 or 4 months.

link

minxomat 1191 days ago

Sure. Sent a ping.

link