|
|
|
|
|
by buzzovich
241 days ago
|
|
Ah, you're misunderstanding. I'm not measuring "cognition." See, it's much simpler. Concrete test setup: - Flawed codebase given to agents for review
- Agent A: Standard behavioural instructions
- Agent B: Same + COGNITION::ETHOS (4 lines added)
Agent B found 20% more flaws than Agent A.
Only variable: those 4 lines.Objective measurement: count of flaws detected. N=40 runs, statistically significant improvement. The evidence is all in the repo. |
|