Hacker News new | ask | show | jobs
by inductive_magic 830 days ago
We're getting very solid results.

Instead of performing rag on the (vectorised) raw source texts, we create representations of elements/"context clusters" contained within the source, which are then vectorised and ranked. That's all I can disclose, hope that helps.

3 comments

Thanks for your message. I should say that giving your comment to GPT-4, with a request for a solution architecture that could produce good results based on the comment, produced a very detailed, fascinating solution. https://chat.openai.com/share/435a3855-bf02-4791-97b3-4531b8...
If only the thing could speak and summarize in plain English instead of hollow, overly verbose bulleted lists.
A whole lot of noise
Maybe, but it expanded on the idea in the vague comment and together introduced me to the idea of embedding each sentence and then clustering the sentences, then taking the centroid of the sentences as the embedding to index/search against. I had not thought of doing that before.
Sounds a little like this recent paper;

"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval"

https://arxiv.org/abs/2401.18059

After seeing raw source text performance, I agree that representational learning of higher-level semantic "context clusters" as you say seems like an interesting direction.