| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by skhatter 102 days ago

Interesting — are you instrumenting the agent workflows themselves with OpenTelemetry spans?

I was wondering how well the standard o11y stack works once agents start running multi-step workflows (agent → tools → other agents → APIs). Tracing probably helps visualize the steps, but I'm curious how people handle operational things like retries, replaying failed workflows, or containing cascading failures across agents.

Those reliability aspects are what I've been exploring.

1 comments

verdverm 101 days ago

I use ADK, a well thought through framework by a company that leads in both AI and engineering. They are the only company in existence with the experience in both to build a solid, production worthy agent framework. It does nearly everything you talk about, with what remains being at the agent design level and not the purview of any framework imo, i.e. wiggum loop level.

People too focused on AI and what's popular are missing the forest for a tree. Most of the things I see being talked about as problems & solutions have already been taken care of with mature tooling choices and engineering practices. History and archeology are underrated skills in software.

link

chirdeeps 101 days ago

OpenTelemetry and standard observability stacks are great for seeing the latency and token counts of individual LLM calls, but they break down when you try to debug the coordination between agents.The hardest failure mode we've had to debug isn't a single agent hallucinating; it's Agent A correctly doing its job, but passing slightly malformed state to Agent B, which then confidently executes a destructive action based on that bad state. By the time you see the error, the root cause is three steps up the chain.Tracing doesn't solve this because it just shows you the execution path, not the authority boundary. What you actually need is a way to enforce contracts between agents—an execution layer that says "Agent B cannot accept this payload from Agent A unless it meets X criteria, and if it fails, rollback Agent A's last action." Until we treat multi-agent systems as concurrent state machines rather than just chained API calls, debugging them is going to remain a nightmare.

link

skhatter 97 days ago

The “authority boundary” framing is really helpful — tracing explains what happened, but not whether a transition between agents should have been allowed.

Curious how teams are handling this today — are those contracts usually defined explicitly (schemas / validators), or are they mostly implicit in the agent code and discovered only after failures?

link

verdverm 100 days ago

If you can't trace across agents (like services), then you haven't set up OTEL completely

What your hard fail is, that's at a different layer of control, separate from OP questions about just seeing it so you can design those control systems. That's more guards, validators, and the like (more subagents)

I stay more human in the loop because these things are not ready for prime time the way you describe using them. That's burning tokens on average imo.

link

skhatter 97 days ago

That makes sense — sounds like a lot of this is handled at the framework + design level in your setup.

In practice, when something does go wrong in a multi-step workflow, do you typically rely on tracing + manual debugging, or do you have built-in mechanisms for partial replay / recovery?

link