| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by saurabhjain1592 155 days ago

Thanks, this is a great question.

We intentionally avoid framing guardrails as “X percent confidence” checks on prompts or model output. In practice, probabilistic confidence at the text level has been the weakest place to enforce safety, especially once workflows become multi step and stateful.

AxonFlow’s non human guardrails are primarily deterministic and context grounded rather than model judgment based. Concretely, they focus on:

- authorization checks on actions, tools, and write paths rather than output quality

- permission evaluation per step using actual tool arguments and proposed side effects

- invariant checks on state transitions, for example whether an action is allowed given what the system has observed so far

- policy decisions that can halt execution entirely rather than degrade or retry

We do use probabilistic components in narrow, explicit places such as PII detection or risk classification, but those always feed into a deterministic policy decision. The system never proceeds because a model “seems confident enough.”

Human approval gates are not there because non human guardrails are insufficient in principle. They exist because some actions are intentionally irreversible or high blast radius, and no amount of model confidence should bypass explicit authorization.

So the distinction we draw is less about validating LLM output and more about deciding whether the system is allowed to move forward at all, given the concrete context and constraints at that moment.