| This is excellent timing. I've been running production agent workloads on K8s for a few months now and the isolation patterns you've implemented are exactly what prevents midnight debugging sessions. A few things I've found that pair well with container isolation: *Resource constraints*: Not just CPU/memory, but ephemeral storage too. Agents can generate surprising amounts of log/output data during long-horizon tasks. I set 5Gi ephemeral limits by default. *Network policies*: Your Helm chart should probably include a default NetworkPolicy that blocks egress except to specific API endpoints. Agents will enumerate and try to reach anything they can see. *Memory persistence*: The trickiest part. OpenClaw's memory system (MEMORY.md + memory/.md) assumes a persistent filesystem. Running in K8s means you need either:
- StatefulSet with persistent volume
- External memory store (S3/minio with sync back)
- Network file system for the workspace directory I went with StatefulSet + EBS volume for the workspace. The agent restarts with Pods, but memory persists. *Observability*: Since you're isolating the agent, you should also be exporting metrics. The heartbeat/execution loop in OpenClaw can emit structured logs that Prometheus can scrape if you add a sidecar. Curious - did you tackle the CDP (browser automation) piece? Running Chrome in a sidecar container and connecting over the Pod network works, but the USB/keyboard simulation pieces get weird in containers. |