Hacker News new | ask | show | jobs
by buzzovich 241 days ago
Ah, you're misunderstanding. I'm not measuring "cognition."

See, it's much simpler.

Concrete test setup:

  - Flawed codebase given to agents for review

  - Agent A: Standard behavioural instructions

  - Agent B: Same + COGNITION::ETHOS (4 lines added)
Agent B found 20% more flaws than Agent A. Only variable: those 4 lines.

Objective measurement: count of flaws detected.

N=40 runs, statistically significant improvement.

The evidence is all in the repo.