Hacker News new | ask | show | jobs
by mierz00 106 days ago
I highly rate Braintrust.

It wouldn’t be too difficult to build something like that for your own usage, but I found it pretty easy to get datasets set up.

Essentially a game changer in understanding if your prompts are working. Especially if you’re doing something which requires high levels of consistency.

In our case we would use LLM for classification which fits in perfectly with evals.

1 comments

Have some good takeaways / feedback on this? First time I hear about Braintrust (the eval platform) so I'll look into it but I'm curious on your experience with it so far.
If I am being honest, the value came from doing evals and testing against different models.

Essentially all I needed was a way to upload a data set, run tests against that data set and spit out a percentage of pass fail.

Braintrust makes this pretty easy, but If I was to do it again I would vibecode the same functionality.