Hacker News new | ask | show | jobs
by jy2947 833 days ago
Please correct me if I misunderstand how it works - basically the input to the countinuous-eval is "dataset" (for example jsonl file, with the in and out from a "retriever" step, potentially golden data too), and one example use case is, say an existing RAG pipeline continuously spit out the data to the dataset, and countinuous-eval can continuously calculate the metrics etc.
1 comments

That's correct, but let me dig a little deeper. Continuous-eval provides two types of metrics, reference-based and reference-free metrics.

In the case of reference-based metrics, you provide a dataset with the input/expected output pairs of each step of the pipeline and use the metrics to measure the performance of the pipeline. This is the best approach for offline evaluation (e.g., in CI/CD) and is the approach that best captures the alignment between what you expect and the actual behavior of the pipeline.

In the case of reference-free metrics, on the other hand, you don't need to provide the expected output, but you can still use the reference-free metrics to monitor the application and get directional insight into its performance.