| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by olegbk 109 days ago

The thread has good ideas on fixing benchmarks (sandboxing, newer datasets). But there's a more fundamental problem: benchmarks are self-reported. The agent runs the test on itself.

An alternative we've been building: attestation-based reputation. Trust scores come from signed proof of work by independent agents who actually delegated tasks and verified outcomes. EigenTrust computes scores from the attestation graph, and NetFlow prevents sybil clusters from inflating each other. You can't inject a pytest hook into a signed interaction history.

Live visualization of how this works: https://agentveil.dev/live