Hacker News new | ask | show | jobs
by safety1st 71 days ago
We were given a demo of a vector based approach, and it didn't work. They said our docs were too big and for some reason their chunking process was failing. So we ended up using a good old fashioned Elastic backend because that's what we know, and simply forwarding a few of these giant documents to the LLM verbatim along with the user's question. The results have been great, not a single complaint about accuracy, results are fast and cheap using OpenAI's micro models, Elastic is mature tech everyone understands so it's easy to maintain.

I think this turned out to be one of those lessons about premature optimization. It didn't need to be as complex as what people initially assumed. Perhaps with older models it would have been a different story.

1 comments

> They said our docs were too big and for some reason their chunking process was failing.

Why would the size of your docs have any bearing on whether or not the chunking process works? That makes no sense. Unless of course they're operating on the document entirely in memory which seems not very bright unless you're very confident of the maximum size of document you're going to be dealing with.

(I implemented a RAG process from scratch a few weeks ago, having never done so before. For our use case it's actually not that hard. Not trivial, but not that hard. I realise there are now SaaS RAG solutions but we have almost no budget and, in any case, data residence is a huge concern for us, and to get control of that you generally have to go for the expensive Enterprise tier.)

I agree it makes no sense. The whole point of chunking is to handle large documents. If your chunking system fails because a document is too big, that seems like a pretty glaring omission. I just chalked it up to the tech being new and novel and therefore having more bugs/people not fully understanding how it worked/etc. It was a vendor and they never gave us more details.

Not all problems have to be solved. We just fell back to using older, more proven technology, started with the simplest implementation and iterated as needed, and the result was great.

That's good. I think if you can get the result you need with a technology that's already familiar to you then, in cases where that tech is still supported, that's going to be a win.

RAG worked well for us in this recent case but, in 3+ years of developing LLM backed solutions, it's the first time I've had to reach for it.