| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 959 days ago

> To address this, I implemented a strategy of tagging messages to create and utilize categories.

I think before RAG we need to do more legwork with the LLM on the raw text. Here is one of my blog posts that is related:

https://mindmachina.wixsite.com/ai-blog/post/the-promise-of-...

The idea is to create chain-of-thought annotations from your raw texts, that would improve the embedding and retrieval process by making implicit things explicit.

For example "the last letter of this message" would not embed similar to "e", but if it was annotated with CoT, it would work.

1 comments

kristiandupont 959 days ago

Interesting, thank you!

I think a lot about the cost of the loop, mostly in terms of time. I don't want the bot to take too long to respond. That's why dream cycles seem like an obvious solution to some of the more heavy work. I guess it would make sense to combine those with your idea -- "given what I know about the user, what should I study?", especially if it has access to an "enhanced" knowledge db like you suggest..

link

visarga 959 days ago

Yes, it would be a good idea for an agent first to collect user interests, and later, when ingesting data in the RAG system, to annotate it with useful metadata such as topic, summary, entities, user interest related question-answer pairs. Whatever we want to ask later better be made explicit in the text.

link