LLM Evals Are Just Tests. Why Are We Making This So Complicated?

Y	Hacker News new \| ask \| show \| jobs

	LLM Evals Are Just Tests. Why Are We Making This So Complicated? (cameronwestland.com)
	3 points by camwest 354 days ago

1 comments

8organicbits 354 days ago

So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing.

link

camwest 354 days ago

Never? No. Way less likely? Yes!

In dev we do 100 consistency checks and get green. In CI we do 10.

link