Y
Hacker News
new
|
ask
|
show
|
jobs
Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents
(
thinkwright.ai
)
2 points
by
oceanwaves
117 days ago