Hacker News new | ask | show | jobs
Show HN: Circe – Deterministic, offline-verifiable receipts for AI agent actions (github.com)
2 points by W_rey45 155 days ago
I’ve been working on a small primitive for agentic systems: a cryptographically signed receipt that records what an AI agent decided, what it did, and what changed — as a single canonical JSON artifact.

The problem: Agent systems today rely on logs, dashboards, or proprietary consoles for truth. Those are easy to forge, truncate, or lose. If an agent takes a high-stakes action (e.g. a firewall change, a deployment, a purchase), there’s no portable artifact you can independently verify later.

The idea: Treat agent execution like a signed transaction, not a log stream. Each run emits a receipt that can be verified offline, without trusting the issuer’s infrastructure.

How it works (minimal core):

Deterministic signing: Ed25519 signatures over a canonical JSON byte string

Canonicalization: RFC 8785-style JSON canonicalization (stable key ordering, UTF-8 encoding, no insignificant whitespace)

Tamper evidence: Any mutation of the signed payload flips the SHA-256 hash and invalidates the signature

Offline verification: A standalone verifier script; no network calls, no dependencies on the issuer

Try it locally (no network):

python verify_receipt.py hn_receipt.json python verify_receipt.py hn_receipt_tampered.json

The first passes; the second fails after a single-field mutation.

This is intentionally not a logging system, observability platform, or policy engine. It’s a small integrity / provenance primitive intended to compose with higher-level agent frameworks.

I’d appreciate feedback on:

Threat-model gaps (e.g. confused-deputy or context-hijacking risks)

Schema ergonomics for high-frequency or long-running agent pipelines

Canonicalization edge cases worth enforcing earlier

3 comments

Threat model / scope: This design assumes the signer’s private key is trusted at issuance time; it does not attempt to prove semantic correctness of the agent’s reasoning or inputs. The signature covers only the canonicalized signed_block; any mutation invalidates verification. Receipts are portable and verifiable offline but do not prevent a malicious issuer from signing false data (integrity primitive, not a truth oracle). Replay is detectable (e.g. via hash chaining or external indexing) but not prevented by the receipt alone. Confidentiality is out of scope; receipts are integrity-only artifacts. The goal is to make post-hoc tampering and log forgery detectable, not to replace policy enforcement or access control.
Solid primitive. Two questions:

1. Crash edge case: If an agent executes a side-effect and dies before signing the receipt, is that action orphaned? Any WAL-style intent/completion model?

2. Multi-step workflows: Do receipts chain natively (parent pointers/Merkle) or via external linking? (I see storage/ledgers are out of scope, but curious about the linkage design.)

The negative proof angle (proving AI didn't touch prod) is compelling for compliance.

Great questions.

1) Crash/orphan side-effect: agreed this needs an intent/commit model. The clean pattern is WAL-style: emit/sign an “INTENT” receipt before side-effect + a “COMMIT” receipt after; absence of COMMIT is itself evidence (“we can’t attest completion”). Another option is tool-level signing so the side-effecting tool returns a signed result that the agent includes.

2) Linking: yes linkage can be native without building storage. Next iteration is parent_hash (or prev_hash) inside the signed_block so receipts naturally chain; Merkle/log indexing stays external.

The "signed transaction, not a log stream" framing is exactly right. Logs are optimistic - you assume they're complete and unmodified. Receipts are pessimistic - you verify before trusting.

We've been thinking about a related problem at toran.sh: capturing what an agent actually sent to external APIs (and what came back) without trusting the agent's self-reported logs. Different angle - we focus on the API request/response level rather than the decision/action level - but the same underlying insight: the source of truth needs to be outside the agent's control.

The Ed25519 + canonical JSON approach is clean. Question: how are you handling schema evolution? If the receipt format changes, older receipts still need to verify but newer tooling might expect different fields.

Great question this is exactly the tension we’re trying to be explicit about.

CIRCE separates cryptographic verification from semantic interpretation. The signature covers a minimal, stable signed_block (canonicalized → hashed → signed). Everything else is metadata that can evolve without affecting verification.

Older receipts remain verifiable because the verifier only assumes the signed scope + canonicalization rules. Newer tooling can understand more fields, but must ignore unknown/missing fields (JWT / signed artifact style). We also include a schema identifier/hash for tooling selection, but it’s intentionally not security-critical — verification is purely about integrity.

Also: toran.sh’s angle is super aligned. Capturing actual API request/response outside the agent’s control feels like the “ground truth” complement to CIRCE’s “decision truth.” Curious: are you anchoring the API transcript via a sidecar/proxy with its own signing key, or are you doing something like a transparency log/Merkle chain for requests?