Hacker News new | ask | show | jobs
by fergal_reid 900 days ago
>The retrieval part is way more important.

I don't agree with this - at Intercom we've put a lot of work into our Fin chatbot, which uses a RAG architecture, and we're still using GPT-4 for the generation part.

GPT-4 is a really powerful and expensive model but we find we need this power to 1) reduce hallucinations acceptably, and 2) keep the quality of inferences made using the retrieved text high.

Now, our bot is answering customer support questions unsupervised - maybe it'd be different for a human in the loop system - but at least in our case, we feel we need a very powerful generation model to reduce errors, even after having benchmarked this thoroughly.

We've also done work on the retrieval end of things, including a customised model, but found the generation side is where we need the most capable models.

1 comments

That's interesting, thanks. My experience is with technical documentation Q&A, returning summaries and relevant passages. My takeaway was that the summary is basically as good as the passages. I do think overall response quality is very subjective and really depends on how it's being used, so whatever users do best with wins the day.