| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tiffanyh 126 days ago
	Super interesting work. Q: how is your AAP different than the industry work happening on Intent/Instructions.

1 comments

alexgarden 126 days ago

The short version: instructions tell the model what to do. An Alignment Card declares what the agent committed to do — and then a separate system verifies it actually did.

Most intent/instruction work (system prompts, Model Spec, tool-use policies) is input-side. You're shaping behavior by telling the model "here are your rules." That's important and necessary. But it's unverifiable — you have no way to confirm the model followed the instructions, partially followed them, or quietly ignored them.

AAP is an output-side verification infrastructure. The Alignment Card is a schema-validated behavioral contract: permitted actions, forbidden actions, escalation triggers, values. Machine-readable, not just LLM-readable. Then AIP reads the agent's reasoning between every action and compares it to that contract. Different system, different model, independent judgment.

Bonus: if you run through our gateway (smoltbot), it can nudge the agent back on course in real time — not just detect the drift, but correct it.

So they're complementary. Use whatever instruction framework you want to shape the agent's behavior. AAP/AIP sits alongside and answers the question instructions can't: "did it actually comply?"

link

tiffanyh 126 days ago

> Then AIP reads the agent's reasoning between every action and compares it to that contract.

How would this work? Is one LLM used to “read” (and verify) another LLMs reasoning?

link

alexgarden 126 days ago

Yep... fair question.

So AIP and AAP are protocols. You can implement them in a variety of ways.

They're implemented on our infrastructure via smoltbot, which is a hosted (or self-hosted) gateway that proxies LLM calls.

For AAP it's a sidecar observer running on a schedule. Zero drag on the model performance.

For AIP, it's an inline conscience observer and a nudge-based enforcement step that monitors the agent's thinking blocks. ~1 second latency penalty - worth it when you must have trust.

For both, they use Haiku-class models for intent summarization; actual verification is via the protocols.

link

tiffanyh 126 days ago

Dumb question: don’t you eventually need a way to monitor the monitoring agent?

If a second LLM is supposed to verify the primary agent’s intent/instructions, how do we know that verifier is actually doing what it was told to do?

link

alexgarden 126 days ago

Not a dumb question — it's the right one. "Who watches the watchmen" has been on my mind from the start of this.

Today the answer is two layers:

The integrity check isn't an LLM deciding if it "feels" like the agent behaved. An LLM does the analysis, but the verdict comes from checkIntegrity() — deterministic rule evaluation against the Alignment Card. The rules are code, not prompts. Auditable.

Cryptographic attestation. Every integrity check produces a signed certificate: SHA-256 input commitments, Ed25519 signature, tamper-evident hash chain, Merkle inclusion proof. Modify or delete a verdict after the fact, and the math breaks.

Tomorrow I'm shipping interactive visualizations for all of this — certificate explorer, hash chain with tamper simulation, Merkle tree with inclusion proof highlighting, and a live verification demo that runs Ed25519 verification in your browser. You'll be able to see and verify the cryptography yourself at mnemom.ai/showcase.

And I'm close to shipping a third layer that removes the need to trust the verifier entirely. Think: mathematically proving the verdict was honestly derived, not just signed. Stay tuned.

link

tiffanyh 126 days ago

Appreciate all you’re doing in this area. Wishing you the best.

link