| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by padolsey 306 days ago
	> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents? Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator.