Hacker News new | ask | show | jobs
Show HN: OpenSOP, We got tired of agents lying to us, so we built them a harness (opensop.ai)
5 points by carlosamg 14 days ago
OpenSOP is an early open-source runtime/standard for executable agentic processes.

You (or your agent) define a process in YAML, and OpenSOP exposes it as a typed REST API that agents and humans can both use.

We built it because a lot of agent workflows still live in prompts, docs, or one-off scripts instead of versioned process definitions, and we wanted more control and auditability. Its under development, we are using it in production (at Coba.ai), feedback on the model, API shape, and use cases would be very useful.

We wanted to share it with the community, any feedback and comments are welcome. Especially now that Claude has released Ant https://x.com/ClaudeDevs/status/2061877343078244459 and its around the same concept.

1 comments

hi, I'm Kuri part of the core development team, any suggestion or feedback will be highly appreciated.
Saying "agents lying" isn't usually deception. It's the model telling a story about a gap that the runtime couldn't fill. This gap could be a rule or a state that the runtime was supposed to respect, but wasn't actually enforced. This means that the contract is completed as text. It is best to use a harness that enforces the SOP out of the model's reach. This is because if you leave anything as prose in context, it will lead to one compaction or one subagent spawn from evaporating. The most difficult case to test would be an SOP that depends on a state that changes under the agent. If step 3 says "use the latest config" and another process updated it after the agent read it at step 1, will the harness re-check, or enforce against the snapshot the agent loaded?
those are good points, the way we thought about the "use the latest config" issue was to instead of using references into somewhere else, if the SOP is critical, we ensure it loads those configs in the step, as a deterministic process, that way we know for sure that they were loaded, if something was not loaded the SOP fails loudly and produces an audit log of the failure, so it can be picked up and fixed.

about the snapshot, we are using versioned SOPs so we can keep track and iterate on them, right now if an agent picks a SOP and runs it, it runs the current version, if we improve the SOP the agent should pick up the new one. So the SOP gets loaded as a snapshot, runs once, produces the audit log and ends the run. So the harness won't recheck.

A retry if failed a specific step would be interesting though.

Thank you for your comments!