| Hi there — you're raising the right questions and these are exactly the attack vectors I built AgentSign to handle. It's not just signing. AgentSign has 5 subsystems (patent pending) and two of them directly address what you're describing: Compromised agent scenario: Subsystem 3 is Runtime Code Attestation. Before every execution, the agent's code is SHA-256 hashed and compared against the attested hash from onboarding. If agent A gets injected via a malicious document and its runtime is modified, the hash comparison fails and execution is blocked. This isn't a one-time check at onboarding — it runs continuously, pre-execution. A compromised agent can't sign anything because it fails attestation before it gets to sign. Replay attacks: Subsystem 2 is Execution Chain Verification — a signed DAG of input/output hashes with unique execution IDs and timestamps bound to each interaction. Replaying a signed payload triggers an execution ID collision. Every agent-to-agent call is a unique, signed, timestamped link in the chain. Trust delegation: AgentSign deliberately has no delegation mechanism. Each agent presents its own passport independently at the verification gate (we call it THE GATE — POST /api/mcp/verify). There's no "agent A vouches for agent B." Every agent is verified on its own identity, its own code attestation, its own trust score. If an attacker controls agent B, they still need B to pass runtime attestation independently — which it won't if the code has been tampered with. Behavioral integrity: Subsystem 5 is Cryptographic Trust Scoring. It's not static — it factors in execution verification rate, success history, code attestation status, and pipeline stage. An agent that starts producing anomalous outputs drops in trust score dynamically and gets flagged. Identity without behavioral integrity is exactly the gap trust scoring fills. The five subsystems working together: identity certs, execution chains, runtime attestation, output tamper detection, and trust scoring. Remove any one and you have the gaps you're describing. Together they close them. That said — I'd genuinely welcome your findings. Red-teaming is how this gets battle-hardened. You can reach me at raza@agentsign.dev or check the SDK at github.com/razashariff/agentsign-sdk. |
The one I want to probe is the file-based hash attestation assumption. If the SHA-256 check runs against on-disk bytes: env injection, lazy-loaded remote modules, and eval() of fetched content all modify execution context without touching the binary. On-disk hash stays clean, behavior changes.
Also interested in whether trust score timing creates an elevation path — benign calls that build score, then exploitation once the threshold is cleared.
Emailed you at raza@agentsign.dev with a formal proposal. $299 flat for a structured adversarial run, first-look before anything is published.