| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by inductive_magic 877 days ago
	We're getting very solid results. Instead of performing rag on the (vectorised) raw source texts, we create representations of elements/"context clusters" contained within the source, which are then vectorised and ranked. That's all I can disclose, hope that helps.

3 comments

Merik 877 days ago

Thanks for your message. I should say that giving your comment to GPT-4, with a request for a solution architecture that could produce good results based on the comment, produced a very detailed, fascinating solution. https://chat.openai.com/share/435a3855-bf02-4791-97b3-4531b8...

link

isoprophlex 877 days ago

If only the thing could speak and summarize in plain English instead of hollow, overly verbose bulleted lists.

link

weird-eye-issue 877 days ago

A whole lot of noise

link

Merik 876 days ago

Maybe, but it expanded on the idea in the vague comment and together introduced me to the idea of embedding each sentence and then clustering the sentences, then taking the centroid of the sentences as the embedding to index/search against. I had not thought of doing that before.

link

reerdna 877 days ago

Sounds a little like this recent paper;

"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval"

https://arxiv.org/abs/2401.18059

link

falling_myshkin 877 days ago

After seeing raw source text performance, I agree that representational learning of higher-level semantic "context clusters" as you say seems like an interesting direction.

link