Hacker News new | ask | show | jobs
by sidk24 92 days ago
Author here. IMO, we have better observability for a Node.js service than for an AI agent.

I build AI agent infrastructure. The post came from a real debugging session. An agent modified 47 files, the build failed, and I spent twenty minutes scrolling terminal output before giving up and starting over.

The core argument: we solved observability for microservices over the last decade (OpenTelemetry, Datadog, Honeycomb, Grafana). AI agents are also distributed systems. Multiple LLM calls, tool invocations, file operations, decision points. But there is no structured trace, no cost attribution per task, no permission audit trail, and no session replay.

Four questions you cannot answer today:

1. What did the agent do? (no structured trace) 2. Why did it do it? (context is ephemeral) 3. What did it cost? (no per-task attribution) 4. What was it allowed to do? (no permission audit trail)

The patterns exist in distributed systems observability. They need to be adapted, not invented. OpenTelemetry's data model (trace IDs, spans, parent-child relationships) maps directly to agent execution.

Happy to discuss the technical details. Particularly interested in hearing from teams that have built ad-hoc agent logging and what they learned.