Taming randomness in ML models with hypothesis testing and marimo

Y	Hacker News new \| ask \| show \| jobs

	Taming randomness in ML models with hypothesis testing and marimo (blog.mozilla.ai)
	54 points by mscolnick 607 days ago

2 comments

axpy906 605 days ago

Good post. I’ve been thinking about doing offline testing of LLM tasks a bit these days and have come to the conclusion that old school testing is the best until more mature features can be developed. Specifically, I mean running a power analysis to determine sample size, random sampling based on that and then running tests like a z test to see if there is a difference and between what bounds. Tests are expensive and I wish there was a better way for realizable offline evals.

link

rhdunn 605 days ago

Have you seen LLM testing tools like promptfoo?

link

axpy906 605 days ago

Yes, I have seen it and BrainTrust too. Unfortunately, need FOSS without vendor at scale.

link

amarcheschi 605 days ago

i - luckily - passed my statistics exam this summer, it's however cool to visualize what's happening

link