| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JamesSwift 175 days ago
	How would you evaluate it if the agent were not a fuzzy logic machine? The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough