Hacker News new | ask | show | jobs
by sdesol 260 days ago
I haven't looked at the code, but it might do what I do with my chat app which is talked about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

The basic idea is, you don't search for a single term but rather you search for many. Depending on the instructions provided in the "Query Construction" stage, you may end up with a very high level search term like beverage or you may end up with terms like 'hot-drinks', 'code-drinks', etc.

Once you have the query, you can do a "Broad Search" which returns an overview of the message and from there the LLM can determine which messages it should analyze further if required.

Edit.

I should add, this search strategy will only work well if you have a post message process. For example, after every message save/upddate, you have the LLM generate an overview. These are my instructions for my tiny overview https://github.com/gitsense/chat/blob/main/data/analyze/tiny... that is focused on generating the purpose and keywords that can be used to help the LLM define search terms.

2 comments

That’s going to be incredibly fragile. You could fix it by giving the query term a bunch of different scores, e.g. its caffeine-ness, bitterness, etc. and then doing a likeness search across these many dimensions. That would be much less fragile.

And now you’ve reinvented vector embeddings.

You could instruct the LLM to classify messages with high level tags like for coffee, drinks, etc. always include beverage.

Given how fast interference has become and given current supported context window sizes for most SOTA models, I think summarizing and having the LLM decide what is relevant is not that fragile at all for most use cases. This is what I do with my analyzers which I talk about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

Inference is not fast by any metric. It is many, MANY orders of magnitude slower than alternatives.
Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?

More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!
How are you quantify the speed at which results are reviewed?
It has become fast enough that another call isn't going to overwhelm your pipeline. If you needed this kind of functionality for performance computing perhaps it wouldn't be feasible, but it is being used to feed back into an LLM. The user will never notice.
Your readmes did a great job at answering my question "why is this file called 1.md? What calls this?" when I searched for "1.md". (The answer is 1=user, 2=assistant, and it allows adding other analyzers with the same structure.)
I'm guessing you are referring to https://github.com/gitsense/chat/tree/main/data/analyze or https://github.com/gitsense/chat/tree/main/packages/chat/wid...

The number is actually the order in the chat so 1.md would be the first message, 2.md would be the second and so forth.

If you goto https://chat.gitsense.com and click on the "Load Personal Help Guide" you can see how it is used. Since I want you to be able to chat with the document, I will create a new chat tree and use the directory structure and the 1,2,3... markdown files to determine message order.

https://github.com/gitsense/chat/blob/129210302ec06985bbd103... also says "put a 1.md here and the modular plugin structure will know to call it".