Ask HN: How do you shut down misbehaving AI in production?

Y	Hacker News new \| ask \| show \| jobs

2 points by nordic_lion 133 days ago

If you are running AI workloads/agents or LLM-backed systems in production, how do you actually shut one down when it starts behaving badly?

By “misbehaving” I mean things like: -runaway spend -latency issues -prompt loops -tool abuse or unexpected external calls -data leakage risks -cascading failures across downstream services

In most systems I’ve seen, there is good observability. You can see logs, traces, cost dashboards. But the actual shutdown mechanism often ends up being manual: disable a feature flag, revoke an API key, roll back a deployment, rate limit something upstream.

I am trying to understand what people are doing in practice.

-What is your actual kill mechanism? -Is it bound to a model endpoint, an agent instance, a workflow, a Kubernetes workload, something else? -Is shutdown automated under certain conditions, or always human-approved? -What did you discover only after your first real incident?

Concrete examples would be extremely helpful.

1 comments

zachdotai 133 days ago

I found it more helpful to try and "steer" the LLM into self-correcting its action if I detect misalignment. This generally improved our task success completion rates by 20%.

link

nordic_lion 133 days ago

Where/how do you define the policy boundary line that triggers course correction?

link

zachdotai 133 days ago

Basically through two layers. Hard rules (token limits, tool allowlists, banned actions) trigger an immediate block - no steering, just stop. Soft rules use a lightweight evaluator model that scores each step against the original task intent. If it detects semantic drift over two consecutive steps, we inject a corrective prompt scoped to that specific workflow.

The key insight for us was that most failures weren't safety-critical, they were the agent losing context mid-task. A targeted nudge recovers those. Generic "stay on track" prompts don't work; the correction needs to reference the original goal and what specifically drifted.

Steer vs. kill comes down to reversibility. If no side effects have occurred yet, steer. If the agent already made an irreversible call or wrote bad data, kill.

link

nordic_lion 133 days ago

One thing I’m still unclear on: what runtime signal is the soft-rule evaluator actually binding to when it decides “semantic drift”?

In other words, what is the enforcement unit the policy is attached to in practice... a step, a plan node, a tool invocation, or the agent instance as a whole?

link

zachdotai 133 days ago

Tool invocation. Each time the agent emits a tool call, the evaluator assesses it against the original task intent plus a rolling window of recent tool results.

We tried coarser units (plan nodes, full steps) but drift compounds fast, by the time a step finishes, the agent may have already chained 3-4 bad calls. Tool-level gives the tightest correction loop. The cost is ~200ms latency per invocation. For hot paths we sample (every 3rd call, or only on tool-category changes) rather than evaluate exhaustively.

link

nordic_lion 132 days ago

That makes sense binding to the smallest viable control surface, and the sampling strategy for hot paths sounds like a pragmatic balance between latency and coverage. Thanks for the additional feedback here.

link