Hacker News new | ask | show | jobs
by devbas 155 days ago
I believe there is a big opportunity for LLM guardrails due to the non- deterministic nature of the Transformer architecture.

However, the just announced Claude Cowork still warns humans to stay in control: https:// claude.com/blog/cowork-research-preview I assume this is because their non-human guardrails are not good enough yet to fully validate the output of an LLM.

What non-human guardrails does Axonflow employ to enforce a policy rule with X% confidence on a prompt / LLM output?

1 comments

Thanks, this is a great question.

We intentionally avoid framing guardrails as “X percent confidence” checks on prompts or model output. In practice, probabilistic confidence at the text level has been the weakest place to enforce safety, especially once workflows become multi step and stateful.

AxonFlow’s non human guardrails are primarily deterministic and context grounded rather than model judgment based. Concretely, they focus on:

- authorization checks on actions, tools, and write paths rather than output quality

- permission evaluation per step using actual tool arguments and proposed side effects

- invariant checks on state transitions, for example whether an action is allowed given what the system has observed so far

- policy decisions that can halt execution entirely rather than degrade or retry

We do use probabilistic components in narrow, explicit places such as PII detection or risk classification, but those always feed into a deterministic policy decision. The system never proceeds because a model “seems confident enough.”

Human approval gates are not there because non human guardrails are insufficient in principle. They exist because some actions are intentionally irreversible or high blast radius, and no amount of model confidence should bypass explicit authorization.

So the distinction we draw is less about validating LLM output and more about deciding whether the system is allowed to move forward at all, given the concrete context and constraints at that moment.