Show HN: Continuous-eval – Granular evaluation of GenAI pipelines

Y	Hacker News new \| ask \| show \| jobs

Show HN: Continuous-eval – Granular evaluation of GenAI pipelines (github.com)

10 points by antonap 841 days ago

Hi HN - we are the creators of “continuous-eval”, an open-source tool to test and evaluate generative AI apps.

"Continuous-eval" came from our efforts to measure, validate and improve the reliability of a finance AI copilot we were developing for banks.

End-to-end evaluation was not enough for us. We wanted to have granular evaluations that help pinpoint the bottlenecks and identify what / how to improve.

We’ve since developed more metrics and made the framework more flexible so it can evaluate components like agent tool use, code change, retrieval steps, etc.

Let us know what you think of our approach to GenAI App evaluation.

1 comments

songrenchu 841 days ago

Nice! Componentization of gen AI workflow is a new trend, and we do need an evaluation framework like this

link

antonap 841 days ago

Yup that’s what drove us to work on a module-level framework since our app is made up of many non-deterministic components. Give it a spin and let us know what you think!

link