| Hey HN, Joe and Ethan from Tonic.ai here. We just released a new open-source python package for evaluating the performance of Retrieval Augmented Generation (RAG) systems. Earlier this year, we started developing a RAG-powered app to enable companies to talk to their free-text data safely. During our experimentation, however, we realized that using such a new method meant that there weren’t industry-standards for evaluation metrics to measure the accuracy of RAG performance. We built Tonic Validate Metrics (tvalmetrics, for short) to easily calculate the benchmarks we needed to meet in building our RAG system. We’re sharing this python package with the hope that it will be as useful for you as it has been for us and become a key part of the toolset you use to build LLM-powered applications. We also made Tonic Validate Metrics open-source so that it can thrive and evolve with your contributions! Please take it for a spin and let us know what you think in the comments. Docs: https://docs.tonic.ai/validate Repo: https://github.com/TonicAI/tvalmetrics Tonic Validate: https://validate.tonic.ai |
https://ai.google.com/research/NaturalQuestions
But I do t see this dataset mentioned much in RAG discussions.