| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 7thpower 798 days ago

This is a great intro. I am amazed how many people don’t use the LLMs to analyze the questions themselves and apply filters to avoid pulling back irrelevant documents in the first place.

We run as many methods as practical in parallel (sql, vector, full text, other methods, etc.) and return the first one that meets our threshold. Vector search is almost never the winner relative to full text.

Instead, I see a lot of people in sister companies using the most robust models they can find and having agents to do chain of thought, while their users are wondering when, if ever, they’ll get a response back.

3 comments

schmidt_fifty 798 days ago

> Vector search is almost never the winner relative to full text.

Full text search is certainly the winner in the time dimension, but can it compete in quality? Presumably which method is likely to provide relevant results depends greatly on the query. Invoking LLMs to pre-process the query and select a retrieval method is going to be quite expensive compared to each of the search methods.

link

7thpower 798 days ago

I mean from a retrieval quality perspective, not a latency perspective. Search latency is not a constraint because the long pole in the tent for us is always the user facing model.

We also have a lot of numbers in our customer requests, which do not typically play to the strengths of the vector searches.

COGs is not a large concern as our audience is internally facing along with a few of our partners, so inference and infrastructure costs are nothing compared to engineering time as we don’t have a way to amortize our costs out across a bunch of customers.

It is also a very high value use case for us.

The other factor is that we’re using fast and cheap models like haiku and mixtral to do the pre processing before we hand things to the retrieval steps, so it’s not much of a cost driver.

link

treprinum 798 days ago

We are optimizing for latency and vector search is sufficient in 80-90% of cases and 0.6s is about the threshold for acceptable end-user experience. Hybrid search with SPLADE is marginally better but it limits the number of human languages we can use. I am wondering when is full-text better compared to vector search outside of very specific keywords.

link

7thpower 798 days ago

Latency of search isn’t much of a concern, I was speaking to quality but did not word it well.

We have just found that vector search does not play well with numbers and does not provide consistent results, so we end up needing more chunks which results compounding token usage, slower responses, and higher chances of incorrect responses due to the customer facing model getting confused by similar results. I’m sure we could optimize our approach but full text has worked far more reliably than expected so we have invested more resources into how we handle documents, latency reduction, and pulling in structured data.

link

cpursley 798 days ago

This sounds really interesting. Do you have any longer-form writeup on this approach (or could you point us towards related info)?

link

7thpower 798 days ago

I do not but my twitter handle is in my profile and I am always more than happy to hop on a call and share what I know.

For reference our subject matter is engineering specs for high precision electronics manufacturing. We have ~100k products and a lot of them have identical documentation except for a few figures (which make all the difference in the world), so it’s a challenging use case that is very unforgiving. Totally doable though and the basis for a lot of capabilities we’ll be investing in moving forward.

Happy to share as I think we’re ahead in a few areas but believe others will catch up and we’ve learned so much from others willing to share info, so we always try to pay forward.

link