Hacker News new | ask | show | jobs
by joewferrara 973 days ago
It's true that there are not a lot of datasets for benchmarking RAG. RAG applications are so tailored to the specific data being used as well as the use case, that a benchmark dataset is not useful across different RAG applications. The data used for a RAG application could be slack messages, technical documentation, insurance policies, internal company microsoft word documents, or a combination of these. For each of these different data source examples, the benchmark dataset would need to be very different.

We recommend that when building a RAG application, the developers build a benchmark dataset specifically tailored to the data being used for the RAG application, and the use case of the RAG application.

1 comments

Well we could say the same about question answering or Info Retrieval, and even LLMs , yet there are plenty of benchmarks for these. The point of a benchmark is not that it covers all use cases, but that it is agreed upon to be a meaningful way to compare different approaches. I suppose I should dig into RAG papers accepted into ICLR/ICML/NeurIPS and look at their experiments section.