| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by buzzovich 288 days ago

Ah, you're misunderstanding. I'm not measuring "cognition."

See, it's much simpler.

Concrete test setup:

  - Flawed codebase given to agents for review

  - Agent A: Standard behavioural instructions

  - Agent B: Same + COGNITION::ETHOS (4 lines added)

Agent B found 20% more flaws than Agent A. Only variable: those 4 lines.

Objective measurement: count of flaws detected.

N=40 runs, statistically significant improvement.

The evidence is all in the repo.