Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents

Y	Hacker News new \| ask \| show \| jobs

	Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents (thinkwright.ai)
	2 points by oceanwaves 117 days ago