| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pbronez 121 days ago
	Yup, agree. “Evaluations” = Tests Gets pretty meta when you’re evaluating a model which needs to evaluate the output of another agent… gotta pin things down to ground truth somewhere.