Hacker News new | ask | show | jobs
Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents (thinkwright.ai)
2 points by oceanwaves 117 days ago