Show HN: AxonFlow, governing LLM and agent workflows

Y	Hacker News new \| ask \| show \| jobs

11 points by saurabhjain1592 155 days ago

Hi HN, we’re building AxonFlow for teams running LLMs or agents in real production systems.

Once agent workflows move past demos, failures are rarely model issues. They tend to show up as execution problems during real runs.

Short 2-minute technical demo showing execution control and auditability in practice: https://youtu.be/FNgnESo9RtI

AxonFlow is a self-hosted, source-available (BSL 1.1) control plane that sits inline in the execution path and governs LLM calls, tool calls, retries, approvals, and policy enforcement step by step. It does not replace your orchestrator and can run alongside LangChain, CrewAI, or custom systems.

The problems we focus on are usually discovered only after going to production: - retries that accidentally repeat side effects - partial failures mid-workflow - permissions that differ per step - limited ability to inspect or intervene during execution

This is not aimed at early demos or hobby projects. It’s for teams already operating under real production constraints.

GitHub: https://github.com/getaxonflow/axonflow

Docs: https://docs.getaxonflow.com

I’d value feedback from folks running LLM or agent workflows in production.

6 comments

mansi_mittal 152 days ago

This resonates with real pain I’ve seen once agent systems leave the demo phase. Most failures aren’t model quality issues — they are retries with side effects, partial execution failures, and lack of visibility/control once things are running live. The idea of a lightweight, inline control plane that doesn’t replace the orchestrator but governs execution step-by-step feels like a pragmatic way to tackle that.

I especially liked the ability to start in observe-only mode and progressively enforce policies, and the focus on auditability and permissions per step. That’s the kind of thing teams usually end up building ad-hoc once compliance or reliability becomes non-optional.

A couple of things I’m curious about (and would love your thoughts on): 1. How you think about debugging or replay for long-running/stateful workflows when enforcement decisions affect downstream steps

2. What you’re seeing in practice around latency overhead at scale when AxonFlow is fully in the hot path

Overall, this feels like it’s aimed at the right stage of maturity — not early demos, but teams already feeling production constraints.

link

saurabhjain1592 151 days ago

Thanks for the thoughtful read. You’ve described exactly the maturity stage we’re targeting: past demos, dealing with retries, partial failures, side effects, and the need for real control once systems are live.

On your questions:

1. Debugging and replay for stateful workflows

We capture step-level execution snapshots across the workflow. Each snapshot records inputs, outputs, duration, tokens, cost, evaluated policies, triggered policies, and the resolution (approved, blocked, overridden).

For enforcement-specific debugging, each snapshot includes which policy matched, what content triggered it, and how it was resolved. When a downstream step fails because an upstream step was blocked or modified, you can trace the execution timeline and see exactly where and how the data flow changed.

We also support human in the loop pause and resume. A step can be paused for approval and later resumed, with the decision and rationale recorded as part of the execution history.

This is not full deterministic replay yet, meaning re-running with identical LLM outputs, but it provides enough visibility to answer “what happened” and “why” in production, which covers most real debugging scenarios.

2. Latency overhead at scale

We operate in two modes depending on requirements:

- Compliance mode: policy violations and blocked requests are written synchronously before returning. This adds a few milliseconds for violation cases, but guarantees the audit record exists before the caller sees the result.

- Performance mode: audit writes are queued asynchronously. Policy evaluation still happens inline, since it may block execution, but persistence is decoupled using bounded queues and worker goroutines.

Most policies are rule-based and pattern matching rather than LLM calls. In practice, teams see single-digit millisecond overhead per request for typical policy sets. Heavier redaction or more complex policies can increase this, but the behavior remains predictable.

Observe-only mode adds essentially no latency beyond the audit write, since no blocking decisions are made.

On orchestration boundaries:

AxonFlow does not require replacing your existing orchestrator. Most teams keep LangChain, LangGraph, or CrewAI for stateful workflow execution and use AxonFlow as a step-level control plane, adding policy gates before each step runs.

For teams building from scratch or wanting tighter integration, AxonFlow can also handle orchestration end to end with governance built in. In practice, most start by adding governance to existing workflows and only consider deeper orchestration later.

For related discussion on how we think about the observability to enforcement gap, there’s a deeper thread here that may be relevant: https://news.ycombinator.com/item?id=46603800

Happy to go deeper on any of this if useful.

link

dhruvghulati 149 days ago

This is massively needed. I know large corporations are building their own frameworks but a new business looking to go agentic can’t do it without this - Langchain just doesn’t scale. Came across this too https://github.com/axsaucedo/kaos

There is the security layer on top to be built too.

Excited to see where this space would go. I think early users will be really innovative scale ups who are looking to code red around agents, top down mandate, maybe looking to reinvent themselves.

link

patthar 155 days ago

Nice work, Saurabh. It think you have tackled a deeper problem with llm and agents governance. The small to midsize companies are racing ahead with agentic integrations where security generally comes as after though, your product gives a no-fluff approach to bolt the security early on and not as an after thought. I'd like take a critical view and curious to hear your and others thoughts about the same.

1) Axonflow offers dual mode architecture - as a gateway or a full blown governance via proxy mode. In my experience, projects(in enterprise) start small but quickly find themselves amidst requiring deeper fine grained control than just as a gateway check. What migration paths do you give for users for a seamless transition ? The last thing a project wants at a certain stage is to rewrite all the llm invocations to go through axonflow.execute_query(). This migration cliff exists and good have an early insight in your architecture.

2) The static (sub-10 ms, in-memory) + dynamic (sub-30 ms, cached) split is good for performance, but the documentation shows policies as a central construct loaded into Postgres/Redis. There is little visibility into how complex/custom/conditional policies (e.g., business-rule-dependent, ML-based anomaly scoring, or external IdP-attribute checks) are authored, versioned, tested, or rolled out safely. AxonFlow risks becoming a bottleneck if policy logic grows beyond simple PII/SQLi/rate-limit rules — especially since dynamic policies still incur DB/cache round-trips. Something that you find in enterprise environments.

3) With complex rules come the performance expectations. As a suggestion, you could try and publish more standard performance benchmarks with sufficiently complex rules both in structure and count. Real world production scenarios, think of - overlapping policies, cold cache, expensive dynamic lookups could significantly push up the tail latencies.

4) Finally, the multi agent planning seems to break the guiding principle of "control plane, not orchestration" boundary. I have no knowledge of the internals and perhaps its documentation that is giving me this perspective but the proxy mode seems to inch towards direct competition with langchain/crewAI.

Much of my observations are what I got from the documentation. Please excuse any errors in my understanding and correct them, where required.

Wishing you the best.

link

saurabhjain1592 155 days ago

Thanks for taking the time to write this. This is exactly the kind of critique I want.

1) Migration path gateway to proxy You are right that many enterprise projects start with perimeter checks and then want deeper execution control. We designed gateway mode specifically as the lowest friction on ramp, not as the end state. The migration path is not "rewrite everything into axonflow.execute_query". The core idea is to keep your orchestration code and incrementally move enforcement points under AxonFlow. Practically, teams start with gateway mode around one workflow for pre check plus audit. Then they adopt step level enforcement by passing step metadata and calling AxonFlow per step. Finally, for workflows where they want deterministic replay, approvals, and full execution trace, they can run those paths in proxy mode while leaving other paths in gateway mode. So it can be mixed. You do not have to flip the whole system at once.

2) Authoring, versioning, testing, rollout of complex policies Agree that if policy logic grows beyond simple checks, the system has to support safe change management. Today policies are treated as a first class config artifact with explicit versioning and audit trail, and we expect teams to test them like code. We are also investing in better tooling here, including policy simulation against recorded traces, staged rollout, and guardrails for regression. On "dynamic implies DB/cache round trips", the hot path is designed to stay cached, and dynamic policy evaluation is bounded and observable. If a policy needs external calls, that is explicit and should be treated as a production dependency with its own SLO, not something hidden inside the control plane.

3) Benchmarks Agree. The numbers in the docs are indicative for simple checks, not a substitute for real benchmarks. We should publish benchmarks that include overlapping policies, cold cache, and more expensive dynamic lookups, and show tail latency percentiles. This is on the near term roadmap because for an inline control plane, p95 and p99 matter more than averages.

4) Multi agent planning versus control plane boundary This is a fair callout. Our primary goal is governance, not planning. Proxy mode needs some orchestration primitives to enforce execution safely, for example retries with side effect control, approval gates, and step level auditability. We are not trying to compete with LangChain or CrewAI on planning sophistication. In most deployments, we expect users to keep their orchestrator and use AxonFlow as a governance layer. The multi agent planning capability exists mainly so teams can start with a governed runtime for simple cases, not as a replacement for full featured agent frameworks.

If you are open to it, I would love to dig into one concrete example from your experience, like what "migration cliff" looked like and which policy types became the bottleneck. Happy to correct any gaps in my understanding too.

link

HappyPablo 155 days ago

When you speak about deterministic policy enforcement, so are these policy regex based or there are some policies based on hard limit business logics. Do you provide ways to track llm api cost on a user basis. It has been a constant headache for us to efficiently track api usages per user in our team ?

link

saurabhjain1592 155 days ago

Good question.

By deterministic policy enforcement we mean rule-based checks that evaluate to an explicit allow or block decision at execution time. Today that includes a mix of regex-based checks (for example PII patterns), structured detectors, and hard limits or business rules like cost caps, rate limits, and permission constraints. These policies are evaluated inline before model or tool calls, so the outcome is predictable and auditable rather than probabilistic.

On cost tracking: yes, AxonFlow captures per-call metadata including model, tokens, provider, and cost, and attributes it to user, workflow, and tenant. In gateway mode this is per-call audit logging, and in proxy mode it extends across multi-step workflows so you can see cost accumulation per user or execution. We also recently shipped Workflow Control Plane which tracks policy evaluation and cost accumulation across multi-step agent executions, so you get a single audit trail and cost rollup for an entire workflow, not just individual calls. That's been a common pain point we've seen with teams running agents in production.

link

fpierfed 155 days ago

Nice project! Quick question: how do you handle LLM access control in practice? For example, can different steps in a workflow run under different credentials or provider accounts, and is that enforced centrally by AxonFlow or delegated to the underlying orchestrator? Thanks!

link

saurabhjain1592 155 days ago

Thanks. In practice, access control is enforced centrally by AxonFlow, not delegated to the orchestrator.

Each LLM or tool call is evaluated at execution time against the active policy context, which includes the user, workflow, step, and tenant. That allows different steps in the same workflow to run under different credentials, providers, or cost and permission constraints if needed.

In gateway mode, the orchestrator still issues the call, but AxonFlow pre-authorizes it and records the decision so the policy is enforced consistently. In proxy mode, AxonFlow holds and applies the credentials itself and routes the call to the appropriate provider.

The key point is that credentials and access rules are defined once and enforced centrally, while orchestration logic remains separate.

link

fpierfed 155 days ago

What kind of latency does this generate? I guess for LLM operations the extra latency might not bet that important. Is that correct?

link

saurabhjain1592 155 days ago

Good question. The overhead is designed to be low enough for inline enforcement. For the fast, rule based checks we typically see single digit millisecond evaluation time, and in gateway mode the end to end pre check usually adds around 10 to 15 ms.

You’re right that relative to an LLM call this is usually negligible, but we still treat it seriously because policy checks also sit in front of tool calls and other non LLM operations where latency matters more. That’s why the static checks are compiled and cached and the gateway path is kept tight.

If you want more detail, I have a longer architecture walkthrough that goes into the execution path and performance model: https://youtu.be/hvJMs3oJOEc

link

fpierfed 155 days ago

Understood. Pretty cool, good luck with the project!

link

widow-maker 155 days ago

Does axonflow support redaction on images ? We have noticed it multiple times that people in our org share images containing critical information with the public apis.

link

rethinkNow 155 days ago

We are currently working on adding image redaction support to Axonflow. This feature became a priority as it was a core requirement for a secured browser feature I have been developing.

I have already forked the community codebase and am actively working on the implementation to ensure sensitive data in images is protected before reaching public APIs. I will share updates as soon as it's ready for use.

link