Show HN: I built a tamper-evident evidence system for AI agents

Y	Hacker News new \| ask \| show \| jobs

Show HN: I built a tamper-evident evidence system for AI agents (guardianreplay.pages.dev)

2 points by Slaine 108 days ago

The demo loads two runs directly in your browser — no signup, no uploads, no network calls after page load.

Frank: a conservative agent. Verification returns VALID. Phil: an aggressive agent with tampered evidence. Verification returns INVALID and points to the exact line where the chain breaks.

The problem I was solving: when an AI agent does something unexpected in production, the post-mortem usually comes down to "trust our logs." I wanted evidence that could cross trust boundaries — from engineering to security, compliance, or regulators — without asking anyone to trust a dashboard.

How it works:

- Every action, policy decision, and state transition is recorded into a hash-chained NDJSON event log - Logs are sealed into evidence packs (ZIP) with manifests and signatures - A verifier (also in the demo) validates integrity offline and returns VALID / INVALID / PARTIAL with machine-readable reason codes - The same inputs always produce the same artifacts — so diffs are meaningful and replay is deterministic

The verifier and the UI are deliberately separated. The UI can be wrong. The verifier will still accept or reject based on cryptographic proof.

Built this before the recent public incidents around autonomous agents made it topical. Happy to answer questions about the architecture, the proof boundary design, or the gaps I'm still working on.

1 comments

shubhamintech 108 days ago

The "trust our logs" problem is real ie regulators and security teams don't care about your dashboard. Curious about the semantic layer though: once you can verify a log is intact, the next hard question is why the agent made the specific decision that caused an incident. Integrity proves the what, but you still need the interpretability layer for the why.

link

Slaine 108 days ago

Yeah, totally agree. Integrity mostly answers the “what happened” part.

The idea is that once the sequence of events is provably intact, you can attach the decision context to it — things like policy snapshots, inputs/prompts (or hashes of them), and state transitions.

Then the evidence layer proves the history wasn’t altered, and analysis tools can reconstruct why the system made a particular decision from that preserved context.

The demo focuses on the integrity layer because without that everything else turns into “trust our dashboard.” Interpretability tools can sit on top of the same evidence

link