Hacker News new | ask | show | jobs
by calebkaiser 358 days ago
I'm biased in that I work on an open source project in this space, but I would strongly recommend starting with a free/open source platform for debugging/tracing, annotating, and building custom evals.

This niche of the field has come a very long way just over the last 12 months, and the tooling is so much better than it used to be. Trying to do this from scratch, beyond a "kinda sorta good enough for now" project, is a full-time engineering project in and of itself.

I'm a maintainer of Opik, but you have plenty of options in the space these days for whatever your particular needs are: https://github.com/comet-ml/opik

1 comments

Yes I'm not sure I really want to vibe code something that does auto evals on a sample of my OTEL traces any more than I want to build my own analytics library.

Alternatives to Opik include Braintrust (closed), Promptfoo (open, https://github.com/promptfoo/promptfoo) and Laminar (open, https://github.com/lmnr-ai/lmnr).

I've used and liked Promptfoo a lot, but I've run into issues when trying to do evaluations with too many independent variables. Works great for `models * prompts * variables`, but broke down when we wanted `models * prompts * variables^x`.
I've used Promptfoo for client projects and really enjoyed it once I got my head around it.