Hacker News new | ask | show | jobs
by avereveard 917 days ago
Agree fully, vector search in embedding space is insufficient if you are working wirh a single document domain (i.e. They are all fish restaurant menu) and then the only thing that can save you is text search. Just make sure the underlying database supports synonyms lists and normalization in the languages you plan using.

About the "bad news" section.

You can do that today by just asking the llm using the ReAct pattern. Give it the database schema, a few shots prompt, and will happily decide to build query, read titles, and do more query if the titles aren't relevant enough, and fetch the content of titles that are relevant and use those to form an opinion.

This may not sem fast, but there are 7b token models that can do it today, at 150+token/second.

2 comments

I think a model could do some basic eval but there are too many hidden assumptions for it to do especially well.
please elaborate, thanks.
this is an example: https://platform.openai.com/playground/p/HpFda4ZRXjbbanBwG35...

it's a ReAct loop with search and retrieve action, where I'm simulating the tool by hand. in prod, you'd pick up the output of the Action, run the callback with the LLM input, get the result, and pass the result as 'Observation:' - for the sake of this demo, I'm doing exactly that but manually copy pasting out of wikipedia

works more or less with any backend, and the llm is smart enough to change direction if a search doesn't produce relevant result (and you can see it in the demo). here the loop is cut short because I was running manually, but you can see the important bits.

just implement a retrieve and search function to whatever data source you have, vector or full text, and a couple regex to extract actions and final answer.

pro tip use a expensive llm to run the react loop, and a cheaper llm to summarize articles content after retrieval and before putting it as an observation. ideally you'd want something like "this is a document {document} on this topic: {last_thought}, extract the information relevant to the user question: {question}" trough a cheap llm, so you have the least amount of token into the react loop.