| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by antonap 872 days ago

That's correct, but let me dig a little deeper. Continuous-eval provides two types of metrics, reference-based and reference-free metrics.

In the case of reference-based metrics, you provide a dataset with the input/expected output pairs of each step of the pipeline and use the metrics to measure the performance of the pipeline. This is the best approach for offline evaluation (e.g., in CI/CD) and is the approach that best captures the alignment between what you expect and the actual behavior of the pipeline.

In the case of reference-free metrics, on the other hand, you don't need to provide the expected output, but you can still use the reference-free metrics to monitor the application and get directional insight into its performance.